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International filing date (day/month/year) 

18 Septennber 1998 (18.09.98) 



Applicant 



Priority date (day/month/year) 

19 Septennber 1997 (19.09.97) 



McGILL UNIVERSITY et al 



1. Notice is hereby given that the International Bureau has communicated, as provided in Article 20. the international application 
to the following designated Offices on the date Indicated above as the date of mailing of this Notice- 

AU,BR,CN,EP,IL,JP,KP,KR,US 

In accordance with Rule 47.1(c), third sentence, those Offices will accept the present Notice as conclusive evidence that 
the communication of the international application has duly taken place on the date of mailing indicated above and no codv 
of the international application is required to be furnished by the applicant to the designated Office(s). 

2. The following designated Offices have waived the requirement for such a communication at this time: 

AL,AM,AP,AT,AZ,BA,BB,BG,BY,CA,CH,CU,CZ,DE,DK,EA,EE,ES,FI,GB,GE,GH,GM,HR,HU,ID,IS, 
KE,KG,KZ,LC,LK,LR,LS,LT,LU,LV,MD,MG,MK,MN,MW,MX,NO,NZ,OA,PL,PT,RO,RU,SD.SE,SG SI 
SK,SUTJ,TM,TR,TT,UA,UG,UZ,VN,YU,ZW ' ' 

The communication will be made to those Offices only upon their request. Furthermore, those Offices do not require the 

applicant to furnish a copy of the international application (Rule 49.1 (a-bis)). 

3. Enclosed with this Notice is a copy of the international application as published by the International Bureau on 
01 April 1999 (01.04.99) under No. WO 99/15639 

REMINDER REGARDING CHAPTER II (Article 31{2)(a) and Rule 54.2) 

If the applicant wishes to postpone entry into the national phase until 30 months (or later in some Offices) from the priority 
date,.a demand for international preliminary examination must be filed with the competent International Preliminary 
Examining Authority before the expiration of 19 months from the priority date. 

it is the applicant's sole responsibility to monitor the 19-month time limit. 

Notethat only an applicant who is a national or resident of a PCT Contracting State which is bound by Chapter II has the 
right to file a demand for international preliminary examination. 

REMINDER REGARDING ENTRY INTO THE NATIONAL PHASE (Article 22 or 39(1}) 

If the applicant wishes to proceed with the international application in the national phase, he must, within 20 months 
or JU months, or later in some Offices, perform the acts referred to therein before each designated or elected Office. 

°" ^ limits and acts to be performed for entering the national phase, see the 

Annex to Form PCT/IB/301 (Notification of Receipt of Record Copy) and Volume II of the PCT Applicant's Guide. 
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amendments under Article 19 has not yet expired and the International Bureau had received neither such amendments nor a 
declaration that the applicant does not wish to make amendments. 



t 



Form PCT/IB/308 (continuation sheet) (July 1996) 



2B4F;623 



PCT/CA98/00884 



^TENT COOPERATION TRI 

From the INTERNATIONAL BUREAU 



PCT 

NOTIFICATION OF ELECTION 

(PCT Rule 61.2) 


To: 

United States Patent and Trademark 
Office 
(Box PCT) 
Crystal Plaza 2 
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International application No. 
PCT/CA98/00884 


Applicant's or agent's file reference 
1 770-1 82PCT 


International filing date (day/month/year) 
18 September 1998 (18.09.98) 


Priority date (day/month/year) 

19 September 1997 (19.09.97) 


Applicant 

ROULEAU, Guy, A. et al 



1. The designated Office is hereby notified of its election made: 

I X I in the dennand filed with the International Preliminary Examining Authority on: 

15 April 1999(15.04.99) 



I I in a notice effecting later election filed with the International Bureau on: 
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made before the expiration of 19 months from the priority date or, where Rule 32 applies, within the time limit under 
Rule 32.2(b). 
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INTERNATIONAL PRELIMINARY EXAMINATION REPORT 

(PCT Article 36 and Rule 70) 



Applicant's or agent's file reference 
1 770-1 82PCT 


See Notification of Transmittal of International 
FOR FURTHER ACTION Preliminary Examination Report (Form PCT/IPEA/416) 


International application No. 
PCT/CA98/00884 


International filing date (day/month/year) 
18/09/1998 


Priority date (day/month/year) 
19/09/1997 


International Patent Classification (IPC) or national classification and IPC 
C12N15/00 


Applicant 

McGILL UNIVERSITY et al. 



1 . This international preliminary examination report has been prepared by this International Preliminary Examining Authority 
and is transmitted to the applicant according to Article 36. 

2. This REPORT consists of a total of 6 sheets, including this cover sheet. 

S This report is also accompanied by ANNEXES, i.e. sheets of the description, claims and/or drawings which have 
been amended and are the basis for this report and/or sheets containing rectifications made before this Authority 
(see Rule 70.1 6 and Section 607 of the Administrative Instructions under the PCT). 

These annexes consist of a total of 3 sheets. 



3. This report contains indications relating to the following items: 

i S Basis of the report 

II □ Priority 

III □ Non-establishment of opinion with regard to novelty, inventive step and industrial applicability 

IV □ Lack of unity of invention 

V S Reasoned statement under Article 35(2) with regard to novelty, inventive step or industrial applicability; 

citations and explanations suporting such statement 

VI S Certain documents cited 

VII □ Certain defects in the international application 

VIII S Certain observations on the international application 
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15/04/1999 


Date of completion of this report 


Name and mailing address of the international 
preliminary examining authority: 

European Patent Office 

Wji D"^°298 Munich 
__ jSiy' Tel. +49 89 2399 - 0 Tx: 523656 epmu d 
Fax: +49 89 2399 - 4465 


Authorized officer v^;^55ls^?>\ 

Merlos-Lange. A.M. (l l) 

Telephone No. +49 89 2399 8559 
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INTERNATIONAL PRELIMINARY 
EXAMINATION REPORT 



international application No. PCT/CA98/00884 



I. Basis of the report 

1 . This report has been drawn on the basis of {substitute sheets which have been furnished to the receiving Office in 
response to an invitation under Article 14 are referred to in this report as "originally filed" and are not annexed to 
the report since they do not contain amendments.): 

Description, pages: 

1 -24 as originally filed 

Claims, No.: 

1-12 as received on 09/1 0/1 999 with letter of 05/1 0/1 999 

Drawings, sheets: 

1/9-9/9 as originally filed 

2. The amendments have resulted in the cancellation of: 

□ the description, pages: 

□ the claims, Nos.: 

□ the drawings, sheets; 

3. □ This report has been established as if (some of) the amendments had not been made, since they have been 

considered to go beyond the disclosure as filed (Rule 70.2(c)): 

4. Additional observations, if necessary: 

see separate sheet 
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V. Reasoned statement under Article 35(2) with regard to novelty, inventive step or industrial 
applicability; citations and explanations supporting such statement 



1. Statement 



Novelty (N) 



Yes: 
No: 



Claims 
Claims 



1 



12 



Inventive step (IS) 



Yes: 
No: 



Claims 
Claims 



1 



12 



Industrial applicability (lA) 



Yes: 
No: 



Claims 
Claims 



1 



•6,8-11 



2. Citations and explanations 
see separate sheet 

VI. Certain documents cited 

1. Certain published documents (Rule 70.10) 
and / or 

2. Non-written disclosures (Rule 70.9) 
see separate sheet 

VIII. Certain observations on the international application 

The following observations on the clarity of the claims, description, and drawings or on the question whether the 
claims are fully supported by the description, are made: 

see separate sheet 
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1) . Section I, Point 4, add. observation 

The sequence listing is shown in additional sheets 1/4 to 4/4. 

2) . Section VI 

The IPER is established on the opinion that the present application enjoys a valid 
priority. In case of an unvalid priority of 09.09.1997, document "Europ. J. of 
Human Genetics, January 1998, 6, 89-94, Philibert, R. A. et. al." may become 
relevant for the assessment of novelty of claim 1 when the application enters the 
regional phase. 

3) . For the assessment of the present claims 7 and 12 on the question whether they 

are industrially applicable, no unified criteria exist in the PCT Contracting States. 
The patentability can also be dependent upon the formulation of the claims. The 
ERG, for example, does not recognize as industrially applicable the subject-matter 
of claims to the use of a compound in medical treatment, but may allow, however, 
claims to a known compound for first use in medical treatment and the use of such 
a compound for the manufacture of a medicament for a new medical treatment. 

Claims 7 and 12 relate to subject-matter considered by this Authority to be 
covered by the provisions of Rule 67.1 (iv) PCT. Consequently, no opinion will be 
formulated with respect to the industrial applicability of the subject-matter of these 
claims (Article 34(4)(a){i) PCT). 

4) , New claim 9 now refers to "a method of categorizing psychiatric patients 

according to their genotype to maximize response to treatment patients 
which does however not appear to be supported in the original disclosure. 
With respect to the "use" claims 10 to 12 it is noted that they also appear broader 
than originally filed claims 5, 6 and 7 insofar as they are not dependent on these 
claims. Therefore the new claims are not limited to the use of determined allelic 
variants being obtained from a nucleic acid sample of a patient according to claim 
4 (original claim 5) or to the use of a non-human mammal model for screening of 
therapeutic agents according to claim 7 (original claim 8). 



Form PCT/Separate Sheet/409 {Sheet 1) (EPO-April 1997) 



INTERNATIONAL PRELIMINARY Internationa! application No. PCT/CA98/00884 
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In view of this, said claims are not considered to conform with the requirements of 
Art. 34 (2)(b) PCT. 

5) . Section VIII 

When considering claim 1, it is not clear whether it is directed to a (wild type?) 
hGTI gene comprising the sequence as set forth in Figs. 3 and 4A to 4C and 
containing transcribed polymprophic CAG repeat or whether it is directed to the 
particular allelic CAG repeat variants thereof. Furthermore, in the absence of the 
complete definition of the allelic CAG repeat variant as given on page 5, lines 26- 
32 or lines 11 to 16, the claim is not rendered more clear. The definition of "alleles 
-3, -2, -1, 0, and 1" is an arbitrary one introduced by the applicant and therefore 
meaningless to the skilled person unless the full meaning is included in the claims. 
Finally, it would appear that claim 1 refers to human GT1 (hGTI). However, 
reference to Fig. 3 which shows a human and a mouse GT1 sequence, introduces 
some doubt whether the mouse sequence should be involved in the scope of 
claim 1 or not. 

In view of the above, the dependent claims 2-12 are not clearly defined in the 
sense of Art. 6 PCT as well. 

With respect to claim 8 it is further noted that it appears to be inclomplete (A 
method to identify genes part of or interacting with a biochemical ...). Moreover, 
the claimed method is not defined by particular procedure steps to identify a gene 
which forms part of or which interacts with a biochemical pathway affected by the 
hGTI gene. Screening of samples with probes or primers derived from the (wild 
type?) hGTI sequence does not appear to result in the identification of the 
desired gene which interacts for example with the biochemical pathway but rather 
in the iderrtification of allelic variants of hGTI gene. As already mentioned above, 
it is not clear whether claim 9 refers to the (wildtype) hGTI gene of claim 1 or to 
particular allelic CAG repeat variants thereof. 

6) . Section V 

None of the available prior art discloses or suggests means and methods as 
described in the present application which therefore appears to conform with the 
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requirements of novelty and inventive step according to Art. 33(2), (3) PCT. 
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Category ° Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



IMAI, Y. ET AL.: "Cloning of a retinoic 
acid-induced gene, GTl, in the embryonal 
carcinoma cell line P19: neuron-specific 
expression in the mouse brain" 
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see the whole document 

ROBITAILLE, Y. ET AL.: "The 

neuropathology of CAG repeat diseases: 

review and update of genetic and molecular 

features" 

BRAIN PATHOLOGY, 
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see the whole document 
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"E" earlier document but published on or after the international 
filing date 
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which is cited to establish the publication date of another 
citation or other special reason (as specified) 

"O" document referring to an oral disclosure, use, exhibition or 
other means 

"P" document published prior to the international filing date but 
later than the priority date claimed 



later document published after the international filing date 
or priority date and not in conflict with the application but 
cited to understand the pnnciple or theory underiying the 
invention 

document of particular relevance; the claimed invention 
cannot be considered novel or cannot be considered to 
involve an inventive step when the document is taken alone 

document of particular relevance; the claimed invention 
cannot be considered to involve an inventive step when the 
document is combined with one or more other such docu- 
ments, such combination being obvious to a person skilled 
in the art. 

'&" document member of the same patent family 
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I. Basis of the report 

1 . This report has been drawn on the basis of {substitute sheets which have been furnished to the receiving Office in 
response to an invitation under Article 14 are referred to in this report as "originally filed" and are not annexed to 
the report since they do not contain amendments.): 

Description, pages: 

1-24 as originally filed 

Claims, No.: 

1-12 as received on 09/1 0/1 999 with letter of 05/1 0/1 999 

Drawings, sheets: 

1/9-9/9 as originally filed 

2. The amendments have resulted in the cancellation of: 

n the description, pages: 

□ the claims, Nos.: 

□ the drawings, sheets: 

3. □ This report has been established as if (some of) the amendments had not been made, since they have been 

considered to go beyond the disclosure as filed (Rule 70.2(c)): 

4. Additional observations, if necessary: 

see separate sheet 
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V. Reasoned statement under Article 35(2) with regard to novelty, inventive step or industrial 
applicability; citations and explanations supporting such statement 
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Statement 



Novelty (N) 



Yes: 
No: 



Claims 
Claims 



1-12 



inventive step (IS) 



Yes: 
No: 



Claims 
Claims 



1-12 



Industrial applicability (lA) 



Yes: 
No: 



Claims 
Claims 



1-6, 8-11 



2. Citations and explanations 
see separate sheet 

VI. Certain documents cited 

1. Certain published documents (Rule 70.10) 
and / or 

2. Non-written disclosures (Rule 70.9) 
see separate sheet 

VIII. Certain observations on the international application 

The following observations on the clarity of the claims, description, and drawings or on the question whether the 
claims are fully supported by the description, are made: 

see separate sheet 
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1) . Section I. Point 4, add. observation 

The sequence listing is shown in additional sheets 1/4 to 4/4. 

2) . Section Vi 

The IPER is established on the opinion that the present application enjoys a valid 
priority. In case of an unvalid priority of 09,09.1997, document "Europ. J. of 
Hunnan Genetics, January 1998, 6, 89-94, Philibert, R. A. et. al." may become 
relevant for the assessment of novelty of claim 1 when the application enters the 
regional phase. 

■-..) 

3) . For the assessment of the present claims 7 and 12 on the question whether they 

are industrially applicable, no unified criteria exist in the PCT Contracting States. 
The patentability can also be dependent upon the formulation of the claims. The 
EPO, for example, does not recognize as industrially applicable the subject-matter 
of claims to the use of a compound in medical treatment, but may allow, however, 
claims to a known compound for first use in medical treatment and the use of such 
a compound for the manufacture of a medicament for a new medical treatment. 

Claims 7 and 12 relate to subject-matter considered by this Authority to be 
covered by the provisions of Rule 67.1(iv) PCT. Consequently, no opinion will be 
formulated with respect to the industrial applicability of the subject-matter of these 
claims (Article 34(4)(a)(i) PCT). 

4) . New claim 9 now refers to "a method of categorizing psychiatric patients 

according to their genotype to maximize response to treatment patients 
which does however not appear to be supported in the original disclosure. 
With respect to the "use" claims 10 to 12 it is noted that they also appear broader 
than originally filed claims 5, 6 and 7 insofar as they are not dependent on these 
claims. Therefore the new claims are not limited to the use of determined allelic 
variants being obtained from a nucleic acid sample of a patient according to claim 
4 (original claim 5) or to the use of a non-human mammal model for screening of 
therapeutic agents according to claim 7 (original claim 8). 
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In view of this, said claims are not considered to conform with the requirements of 
Art. 34 (2)(b) PCX. 

5). Section VIII 

When considering claim 1 , it is not clear whether it is directed to a (wild type?) 
hGT1 gene comprising the sequence as set forth in Figs. 3 and 4A to 4C and 
containing transcribed polymprophic CAG repeat or whether it is directed to the 
particular allelic CAG repeat variants thereof. Furthermore, in the absence of the 
complete definition of the allelic CAG repeat variant as given on page 5, lines 26- 
32 or lines 1 1 to 16, the claim is not rendered more clear. The definition of "alleles 
-3, -2,-1, 0, and 1" is an arbitrary one introduced by the applicant and therefore 
meaningless to the skilled person unless the full meaning is included in the claims. 
Finally, it would appear that claim 1 refers to human GT1 (hGTI). However, 
reference to Fig. 3 which shows a human and a mouse GT1 sequence, introduces 
some doubt whether the mouse sequence should be involved in the scope of 
claim 1 or not. 

In view of the above, the dependent claims 2-12 are not clearly defined in the 
sense of Art. 6 PCT as well. 

With respect to claim 8 it is further noted that it appears to be inclomplete (A 
method to identify genes part of or interacting with a biochemical ...). Moreover, 
the claimed method is not defined by particular procedure steps to identify a gene 
which forms part of or which interacts with a biochemical pathway affected by the 
hGTI gene. Screening of samples with probes or primers derived from the (wild 
type?) hGT1 sequence does not appear to result in the identification of the 
desired gene which interacts for example with the biochemical pathway but rather 
in the identification of allelic variants of hGTI gene. As already mentioned above, 
it is not clear whether claim 9 refers to the (wildtype) hGTI gene of claim 1 or to 
particular allelic CAG repeat variants thereof. 

6). Section V 

None of the available prior art discloses or suggests means and methods as 
described in the present application which therefore appears to conform with the 
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The embodiments of the invention in which an exclusive 
property or privilege is claimed are defined as fol- 
lows : 

1 - A hGTl gene containing transcribed polymorphic 

CAG repeat, which comprises a sequence as set forth in 
Fig. 3 and Figs, 4A-4C, wherein allelic variants of CAG 
repeat are selected from the group consisting of 
alleles -3, -2, -1, 0 and 1, and wherein said allelic 
variants are associated with schizophrenia, affective 
disorders, neurodevelopmental brain diseases or with 
phenotypic variability with respect to long term 
response to neuroleptic medication. 

2. The gene of claim 1, wherein said affective 

disorder is manic depression. 

3 . A method for the prognosis of severity of 

schizophrenia of a patient, which comprises the steps 
of: 

a) obtaining a nucleic acid sample of said patient; 
and 

b) determining allelic variants of CAG repeat of 
the gene of claim 1, and wherein allelic 
variants shorter than allele 0 are indicative of 
lion- severe schi zophrenia . 

4. A method for the identification of patient 

responding to neuroleptic medication, which comprises 
the steps of : 

a) obtaining a nucleic acid sample of said patient; 
and 

b) determining allelic variants of CAG repeat of 
the gene of claim 1, and wherein allelic 
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variants shorter than allele 0 are indicative of 
neuroleptic response . 



5. The method of claim 4, wherein said shorter 

allelic variants have from about 171 to about 177 bp in 
length . 

6- A non-human mammal model for the hGTl gene of 

claim 1, whose germ cells and somatic cells are 
transformed and expresses at least one allelic variant 
of the hGTl gene and wherein said allelic variant of 
the hGTl being introduced into the mammal, or an 
ancestor of the mammal, at an embryonic stage. 

7. A method for the screening of therapeutic agents 
for the prevention and/or treatment of schizophrenia, 
which comprises the steps of: 

a) administering said therapeutic agents to the 
non-human mammal of claim 6 or schizophrenia 
patients; and 

b) evaluating the prevention and/or treatment of 
development of schizophrenia in said mammal or 
said patients. 

8. A method to identify genes part of or interact- 
ing with a biochemical pathv/a**/ affected b^^ bGT^ nf^^-nn 
which comprises the steps of: 

a) designing probes and/or primers using the hGTl 
gene of claim 1 and screening psychiatric 
patients samples with said probes and/or prim- 
ers ; and 

b) evaluating the identified gene role in psychi- 
atric patients. 
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9. A method of categorizing psychiatric patients 

according to their genotype to maximize response to 
treatment patients, which comprises the steps of: 

a) obtaining a nucleic acid sample of said 
patients; and 

b) determining allelic variants of CAG repeat of 
the gene of claim 1, wherein patients are 
categorized with respect to their allelic vari- 
ants and wherein allelic variants shorter than 
allele 0 are indicative of neuroleptic response. 



10. The use of the determination of allelic variants 
of CAG repeat of the gene of claim 1 for the 
identification of patient responding to neuroleptic 
medication, wherein allelic variants shorter than 
allele 0 are indicative of neuroleptic response. 

11. The use of claim 10, wherein said shorter alle- 
lic variants have from about 171 to about 177 bp in 
length. 

12 . The use of the model of claim . 6 for the 

screening of therapeutic agents for the manufacture of 
a medicament for prevention and/or treatment of schizo- 
phrenia . 
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The embodiments of the invention in which an exclusive 
property or privilege is claimed are defined as fol- 
lows : 

1. A hGTl gene containing transcribed polymorphic 
CAG repeat, which comprises a sequence as set forth in 
Fig, 3 and Figs. 4A-4C. 

2. The gene of claim 1, wherein allelic variants of 
CAG repeat are associated with schizophrenia, affective 
disorders, neurodevelopmental brain diseases or with 
phenotypic variability with respect to long term 
response to neuroleptic medication. 

3. The gene of claim 2, wherein said affective dis- 
order is manic depression, 

4. A method for the prognosis of severity of 
schizophrenia of a patient, which comprises the steps 
of: 

a) obtaining a nucleic acid sample of said patient; 
and 

b) determining allelic variants of CAG repeat of 
the gene of claim 1, and wherein short allelic 
variants are indicative of non-severe schizo- 
phrenia . 

5. A method for the identification of patient 
responding to neuroleptic medication, which comprises 
the steps of : 

a) obtaining a nucleic acid sample of said patient; 
and 

b) determining allelic variants of CAG repeat of 
the gene of claim 1, and wherein short allelic 
variants are indicative of neuroleptic response. 
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6. The method of claim 5, wherein said short alle- 
lic variants have from about 171 to about 177 bp in 
length . 

7. A non-human mammal model for the hGTl gene of 
claim 1, whose germ cells and somatic cells are modi- 
fied to express at least one allelic variant of the 
hGTl gene and wherein said allelic variant of the hGTl 
being introduced into the mammal, or an ancestor of the 
mammal, at an embryonic stage. 

8- A method for the screening of therapeutic agents 

for the prevention and/or treatment of schizophrenia, 
which comprises the steps of: 

a) administering said therapeutic agents to the 
non-human mammal of claim 7 or schizophrenia 
patients; and 

b) evaluating the prevention and/or treatment of 
development of schizophrenia in said mammal or 
said patients. 

5. A method to identify genes part of or interact- 

ing with a biochemical pathway affected by hGTl gene, 
which comprises the steps of: 

a) designing probes and/or primers using the hGTl 
gene of claim 1 and screening psychiatric 
patients samples with said probes and/or prim- 
ers ; and 

b) evaluating the identified gene role in psychi- 
atric patients. 



10. A method of stratifying psychiatric patients 

based on the allelic variants of the hGTl gene of claim 
1 for clinical trials purposes, which comprises: 
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a) obtaining a nucleic acid sample of said 
patients; and 

b) determining allelic variants of CAG repeat of 
the gene of claim 1, wherein patients are 
stratified with respect to their allelic vari- 
ants and wherein short allelic variants are 
indicative of neuroleptic response. 
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_ sheets. 



[)[\ It is also accompanied by a copy of each prior art document cited in this report. 



1 . Certain claims were found unsearchable(see Box I). 

2. Q[] Unity of invention is lacking(see Box II). 



3. [w] The international application contains disclosure of a nucleotide and/or amino acid sequence listing and the 

international search was carried out on the basis of the sequence listing 
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(57) Abstract 

The present invention provides an isolated DNA molecule of the autosomal dcaninant spinocerebellar atajua type 1 gene which is 
*® ^^^^ ™ chromosome 6. This isolated DNA molecule is preferably located within a 3.36 kb EcoM fragment, Le an 
EcoRi fragment contammg about 3360 base pairs, of the SCAl gene. The isolated sequences contain a CAG repeat region. The number of 
CAG trinucleotide repeats (n) is ^ 36, preferablyn = 19-36, for normal individuals. For an affected individual n > 36 preferably n > 43 
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riFNF SKOU ENrF FOR SP TNOCKREBELLAR ATAXIA TYPE 1 
5 AND METHOD FOR DIAGNOSIS 

Statement of Governnient Rights 

The present invention was made with government support under 
Grant Nos. NS 22920 and 27699, awarded by the National Institutes of Health. The 
10 Government has certain rights in this invention. 

Backgroun d of the Invention 

The spinocerebellar ataxias are a heterogeneous group of 
degenerative neurological disorders with variable clinical features resulting from 

15 degeneration of the cerebellum, brain stem, and spinocerebellar tracts. The clinical 
symptoms include ataxia, dysarthria, ophthalmoparesis, and variable degrees of 
motor weakness. The symptoms usually begin during the third or fourth decade of 
life, however, juvenile onset has been identified. Typically, the disease worsens 
gradually, often resulting in complete disability and death 10-20 years after the 

20 onset of symptoms. Individuals with juvenile onset spinocerebellar ataxias, 
however, typically have more rapid progression of the phenotype than the late onset 
cases. A method for diagnosing spinocerebellar ataxias would provide a significant 
step toward its treatment. 

Spinocerebellar ataxia type 1 (SCAl) is an autosomal dominant 

25 disorder which is genetically linked to the short arm of chromosome 6 based on 
linkage to the human major histocompatibility complex (HLA). See, for example, 
H. Yakura et al., N. gngl. I M^d. , 291, 154-155 (1974); and J.F. Jackson et aL, H 
Engl. J. Med. . 226, 1138-1 141 (1977). SCAl has been shown to be tightly linked to 
the marker D6S89 on the short arm of chromosome 6, telomeric to HLA. See, for 

30 example, L.P.W. Ranum et al.. Am. J. Hum. Genet. . 42, 31-41 (1991); and H.Y. 
Zoghbi et aL, Am. J. Hum. Genet- 49, 23-30 (1991). Recently, two families with 
dominantly inherited ataxia failed to show detectable linkage with HLA markers but 
were found to have SCAl when studied for linkage to D6S89, demonstrating the 
superiority of the latter marker for study of ataxia families. See, for example, B.J.B. 

35 Keats et al.. Am. J. Hum. Genet. . 42, 972-977 (1991). The identification and 
cloning of the SCAl gene could provide methods of detection that would be 
extremely valuable for both family counseling and planning medical treatment. 
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Summary of the I nvention 

The present invention is directed to a portion of an isolated 1 .2-Mb 
region of DNA from the short arm of chromosome 6 containing a highly 
polymorphic CAG repeat region in the SCAl gene. This CAG repeat region is 
unstable (i.e., highly variable within a population) and is expanded in individuals 
with the autosomal dominant neurodegenerative disorder spinocerebellar ataxia type 
1 (i.e., affected individuals generally have more than 36 CAG repeats). Southern 
and PGR analyses of the CAG repeat region demonstrate correlation between the 
size of the expanded repeat region and the age-of-onset of the disorder (with larger 
alleles, i.e., more repeat units, occurring in juvenile cases), and severity of the 
disorder (with larger alleles, i.e., more repeat units, occurring in the more severe 
cases). 

Specifically, die present invention provides a nucleic acid molecule 
containing a CAG repeat region of an isolated autosomal dominant spinocerebellar 
ataxia type 1 gene (herein referred to as "SCAl"), which is located within the short 
arm of chromosome 6. The SCAl gene contains a region that encodes a protein 
herem referred to as "ataxin-l." The nucleic acid molecule of the presem invention 
can be a single or a double-stranded polynucleotide. It can be genomic DNA 
cDNA, or mRNA of any size as long as it includes the CAG repeat region of an' 
isolated SCAl gene. Preferably, the nucleic acid molecule includes the SCAl 
coding region and is of about 2.4-1 1 kb in length. It can be the entire SCAl gene 
(whether genomic DNA or a transcript thereof) or any fragment thereof that contains 
the CAG region of the gene. One such fragmem is an £coRI fragmem of the SCAl 
gene, i.e., a fragment obtained through digestion with EcoRl endonuclease 
restnction enzyme, containing about 3360 base pairs having therein a polymorphic 
CAG repeat region. By polymorphic CAG repeat region it is meant that there are 
repeating CAG trinucleotides in this portion of the gene that can vary in the number 
of CAG tnnucleotides. The number of trinucleotide repeats can vary from as few as 
1 9, for example, to as many as 8 1 , for example, and larger. 

For a normal individual, n ^ 36 in the (CAG)„ region, i.e n = 2-36 
and typically n = 19-36. This region in a normal allele of the SCAl gene is' 
optionally interrupted with CAT trinucleotides. Typically, there are no more than 
about 3 CAT trinucleotides, either individually or in combination, within any 
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(CAG)n region. The (CAG)n region of this isolated sequence is unstable, i.e.. highly 
variable within a population, and larger, i.e., expanded, in individuals who have 
symptoms of the disease, or who are likely to develop symptoms of the disease. For 
an affected individual, i.e., an individual with an affected allele of the SCAl gene, 
5 n > 36 in the (CAG)^ region, and typically n > 43. One isolated DNA molecule of 
the SCAl gene is about 3360 base pairs in length as shown in Figure 1. The 
sequences of a portion of the EcoRl fragment within the SCAl gene of several 
affected individuals is shown in Figure 2. The entire 10,660 nucleotides of the 
SCAl gene transcript are shown in Figure 15 (the entire SCAl gene spans about 450 
10 kb of genomic DNA). 

The present invention is also directed to isolated oligonucleotides, 
particularly primers for use in PCR techniques and probes for diagnosing the 
neurodegenerative disorder SCAl. The oligonucleotides have at least about 11 
nucleotides and hybridize to a nucleic acid molecule containing a CAG repeat 
15 region of an isolated SCAl gene. The hybridization can occur to any portion of a 
nucleic acid molecule containing a CAG repeat region of the SCAl gene 
Preferably, the oligonucleotides hybridize to a 3.36 kb EcoRl fragment of an SCAl 
gene having a CAG repeat region. Alternatively stated, each oligonucleotide is 
substantially complementary (having greater than 65% homology) to a nucleotide 
0 sequence having a CAG repeat region, i.e., a (CAG)„ region, preferably to a 3 36-kb 
Ecom fragment of the SCAl gene. If the oligonucleotide is a primer the molecule 
preferably contains at least about 16 nucleotides and no more than about 35 
nucleotides. Furthermore, preferred primers are chosen such that they produce a 
pnmed product of about 70-350 base pairs, preferably about 100-300 base pairs 
5 More preferably, the primers are chosen such that nucleotide sequence is 
complementary to a portion of a strand of an affected or a normal allele within about 
50 nucleotides on either side of the (CAG)„ region, including directly adjacent to 
the (CAG)„ region. Most preferably, the primer is selected from the erouD 
consisting of CCGGAGCCCTGCTGAGGT (CAG-a), CCAGACGCCGGGACAC 
(CAG-b), AACTGGAAATGTGGACGTAC /Og- 

CAACATGGGCAGTCTGAG (Rep-2), CCACCACTCCATCCCAGC (GCT435 ' 
TGCTGGGCTGGTGGGGGG (GCT-214), CTCTCGGCTTTCTTGGTG (Pre l ' 
and GTACGTCCACATTTCCAGTT (Pre-2). These primers substLtially 
correspond to those shown in Figure 3 . 
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They can be used in any combination for sequencing or producing 
amplified nucleic acid molecules, e.g., DNA molecules, using various PGR 
techniques. Preferably, for amplification of the DNA molecule characteristic of the 
SCAl disorder, Rep-1 and Rep-2 is the primer pair used. As used herein, the term 
"amplified DNA molecule" refers to DNA molecules that are copies of a portion of 
DNA and its complementary sequence. The copies correspond in nucleotide 
sequence to the original DNA sequence and its complementary sequence. The term 
"complement", as used herein, refers to a DNA sequence that is complementary 
(having greater than 65% homology) to a specified DNA sequence. The term 
"primer pair", as used herein, means a set of primers including a 5' upstream primer 
that hybridizes with the 5' end of the DNA molecule to be amplified and a 3' 
downstream primer that hybridizes with the complement of the 3' end of the 
molecule to be amplified. 

Using the primers of the present invention, PGR technology can be 
used in the diagnosis of the neurological disorder SGAl by detecting a region of 
greater than about 36 GAG repeating trinucleotides, preferably at least 43 repeating 
GAG trinucleotides. Generally, this involves treating separate complementary 
strands of the DNA molecule containing a region of repeating GAG codons with a 
molar excess of two oligonucleotide primers, extending the primers to form 
complementary primer extension products which act as templates for synthesizing 
the desired molecule containing the GAG repeating units, and detecting the 
molecule so amplified. 

An oligonucleotide that can be used as a gene probe for identifying a 
nucleic acid molecule, e.g., a DNA molecule, containing a GAG repeat region of the 
SGAl gene is also provided. The gene probe can be used for distinguishing 
between the normal and the larger affected alleles of the SGAl gene. The gene 
probe can be a portion of a nucleotide sequence of the SGAl gene itself (e g a 
3.36-kb EcoRl fi^gment or portion thereof), complementary to it, or hybridizable'to 
It or the complement It is of a size suitable for forming a stable duplex, i e having 
at least about 1 1 nucleotides, preferably having at least about 15 nucleotides more 
preferably having at least about 100 nucleotides (for effective Southern blotting) 
and most preferably having at least about 200 nucleotides. The probe can contain 
any portion of the (GAG)„ region, although this is not a requirement. It is desirable 
however, for the probe to contain a portion of the nucleic acid molecule on either 
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side of the (CAG)„ region. There is generally no maximum size limitation for such 
probes. In fact, the entire SCAl gene could be a probe. 

The gene probe of the present invention is useable in a method of 
diagnosing a patient for SCAl. A particularly preferred method of diagnosis 
involves detecting the presence of a DNA molecule containing a CAG repeat region 
of the SCAl gene. Specifically, the method includes the steps of digesting genomic 
DNA with a restriction endonuclease to obtain DNA fragments; preferably, 
separating the fragments by size using gel electrophoresis; probing said DNA 
fragments under hybridizing conditions with a detectably labeled gene probe that 
hybridizes to a nucleic acid molecule containing a CAG repeat region of an isolated 
SCAl gene; detecting probe DNA which has hybridized to said DNA fragments; 
and analyzing the DNA fragments for a (CAG)„ region characteristic of the normal' 
or affected forms of the SCAl gene. 

The present invention also provides a protein (or portions thereof) 
15 encoded by the SCAl gene and antibodies (polyclonal or monoclonal) produced 
from the protein or portions thereof. The antibodies can be used in methods of 
isolating antigenic protein expressed by the SCAl gene. For example, they can be 
added to a biological sample containing the antigenic protein to form an antibody- 
antigen complex, which can be isolated from the sample and exposed to amino acid 
20 sequencing of the antigenic protein. This can be done while the protein is still 
complexed with the antibody. 

Thus, the present invention provides methods to determine the 
presence or absence of an affected form of the SCAl gene, which can be based on 
RNA- or DNA-based detection methods (preferably, the methods involve isolating 
25 and analyzing genomic DNA) or on protein-based detection methods. These 
methods include, for example, PCR-based methods, direct nucleic acid sequencing, 
measuring expression of the SCAl gene by measuring the amount of mRNA 
expressed or by measuring the amount of ataxin-l protein expressed. The methods 
of the present invention also include determining the size of the repeat region of the 
30 nucleic acid or amino acid molecules. 

As used herein, the term "isolated (and purified)" means that the 
nucleic acid molecule, gene, or oligonucleotide is essentially free from the 
remainder of the human genome and associated cellular or other impurities. This 
does not mean that the product has to have been extracted from the human genome; 
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rather, the product could be a synthetic or cloned product for example. As used 
herein, the term ''nucleic acid molecule" means any single or double-stranded RNA 
or DNA molecule, such as mRNA, cDNA, and genomic DNA. 

As used herein, the term ''SCAl gene" means the 
5 deoxyribopolynucleotide located within the short arm of chromosome 6 between 
markers D6S89 and D6S274 of about 450 kb (10.5-1 1 kb transcript) containing an 
unstable CAG repeat region. This term, therefore, refers to numerous unique genes 
that are substantially the same except for the content of the CAG repeat region. A 
representative example of the SCAl gene transcript for a normal individual is shown 

10 in Figure 15. Included within the scope of this term is any ribo- or deoxyribo- 
polynucleotide containing zero, one or more nucleotide substitutions that also 
encodes the protein ataxin-1. Included in the term "SCAl gene" is any 
polynucleotide as described in the previous sentence that has different numbers of 
CAG and/or CAT repeats in the polymorphic CAG repeat region. It is understood 

15 also that the term "SCAl gene" includes both the polypeptide-encoding region and 
the regions that encode the 5' and 3' untranslated segments of the mRNA for SCAl. 
Although the SCAl gene described herein is described in terms of the human 
genome, it is envisioned that other mammals, e.g., mice, may also have a very 
similar gene containing a CAG repeat region that could be used to produce 

20 oligonucleotides, for example, that are useful in diagnosing the SCAl disorder in 
humans. 

As used herein, the term "ataxin-1" means the gene product of the 
SCAl gene, i.e., protein encoded by the open reading firame of the SCAl gene and 
any protein substantially equivalent thereto, including all proteins of different 

25 lengths (e.g., 20-90 kD, preferably 60-90 kD) encoded by said open reading frame 
which start at each in-frame ATG translation start site. The term "ataxin-1" further 
includes all proteins with essentially the same N-terminal and C-terminal sequences 
but different numbers of glutamine (Q) and/or histadine (H) repeats (primarily 
glutamine repeats) in the polymorphic repeat region. 

30 As used herein, the term "polymorphic CAG repeat region" or simply 

"CAG repeat region" means that region of the SCAl gene that encodes a string of 
polyglutamate residues that varies in number from individual allele to individual 
allele, and which can range in number from 2 to 80 or more. Moreover, the 
polymorphic CAG repeat regions can contain CAT (encoding histidine) in place of 



wo 95/01437 



PCTAJS94/07336 



-7- 

CAG, although CAT is much less common than CAG in this region. It is to be 
understood that when referring to nucleic acid molecules containing the CAG repeat 
region, this includes RNA molecules containing the corresponding GUC repeat 
region, 

5 As used herein, an "affected" gene refers to the allele of the SCAl 

gene that, when present in an individual, is the cause of spinocerebellar ataxia type 
1, and an "affected" individual has the symptoms of autosomal dominant 
spinocerebellar ataxia type 1. Individuals with only "normal" SCAl genes, do not 
possess the symptoms of SCAl. The term "allele" means a genetic variation 

10 associated with a coding region; that is, an alternative form of the gene. 

As used herein, "hybridizes" means that the oligonucleotide forms a 
noncovalent interaction with the stringency target nucleic acid molecule under 
standard conditions. The hybridizing oligonucleotide may contain nonhybridizing 
nucleotides that do not interfere with forming the noncovalent interaction, e.g., a 

15 restriction enzyme recognition site to facilitate cloning. 

Brief Descrintion of the Drawing s 
Figure 1. Sequence of the 3.35 kb EcoRl fragment of the normal 
SCAl gene located within the short arm of chromosome 6. It is within this 
20 fragment that mutations occur in the CAG repeat region which are associated with 
autosomal dominant spinocerebellar ataxia type I . 

Figure 2, Sequence information for five affected individuals in the 
CAG repeat region, i.e., the CAG trinucleotide repeat, and its flanking regions of the 
SCAl gene located within a short arm of chromosome 6. 
25 Figure 3. Sequence of the CAG trinucleotide repeat and its flanking 

regions. About 500 nucleotides in a single strand of DNA of the 3.36 kb EcoRl 
fragment of the SCAl gene shown in Figure 1 is represented. The locations of PCR 
primers are shown by solid lines with arrowheads. 

Figure 4. Summary of SCAl recombination events that led to the 
30 precise mapping of the SCAl locus. Recombinant disease-carrying chromosomes 
are shown for the markers shown above. A schematic diagram of the relevant 
region of 6p22 (not dravra to scale) is shovm at the top of the figure. Families are 
coded as follows: TX = Houston, MN = Minnesota, MI = Michigan, IT = Italy. 
Each recombination event is given a number following the family code. 
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Figure 5. Regional localization of 6p22-p23 STSs by PGR analysis 
of radiation reduced hybrids. Three panels (a-c) demonstrate the regional 
localization of D6S274, D6S288, and AMIOGA. In each panel PGR amplification 
results are shown for genomic DNA, the 1-7 cell line which retains 6p, the radiation 
5 reduced hybrids R17, R72, R86, and R54, and RJK88 hamster DNA. A blank 
control (c) is shown for every panel. R86 has been previously shown to retain 
D6S89; R17 and R72 are known to contain D6S88 and D6S108, two DNA markers 
which map centromeric to D6S89. An amplification product is seen in 1-7, R17, 
R72, and R86 for D6S274 and D6S288, whereas the amplification product for 

10 AMIOGA is only seen in 1-7 and R86 confirming that D6S274 and D6S288 map 
centromeric to AMIOGA and D6S89. 

Figure 6. A schematic diagram of 6p22-p23 region showing the new 
markers and the YAG contig. At the bottom of the diagram, the radiation hybrid 
reduced panel used for regional mapping is shown. YAC clones are represented as 

15 dark lines, open segments indicate a noncontiguous region of DNA. The 
discontinuity shown in YAG clone 35 IB 10 indicate that this YAG has an internal 
deletion. All of the ends of the YAG clones that were isolated are designated by an 
"L" for the left end or an "R" for the right end. 

Figure 7. Genotypic data for 6p22-p23 dinucleotide repeat markers 

20 are shown for a reduced pedigree fi-om the MN-SCAl kindred. This figure 
summarizes a second recombination event that led to the precise mapping of the 
SGAMocus. 

Figure 8. Long-range restriction maps of YACs, 227B1, 60H7, 
195B5, A250D5, and 379G2. YAGs 351B10, 172B5, 172B5, and 168F1 were also 

25 used in the restriction analysis (data not shown). The restriction sites are marked as 
N, Notl; B, BssUll; Nr, A^rwl; M, MM, S, Sacll, and Sa, 5a/I. A sunmiary map of 
the SGAl gene region with the position of the DNA markers used as probes (boxes) 
is shown. The centromere-telomere orientation is indicated by cen/tel respectively. 

Figure 9. Physical map of the SGAl region. The positions of 

30 various genetic markers and sequence tagged sites (STSs) relative to the overlapping 
YAG clones are shown. AM 10 and FLBl are STSs developed using a radiation 
reduced hybrid retaining chromosome 6p22-p23, A205D5-L and 195B5-L are STSs 
fi-om insert termini of YAGs A250D5 and 195B5. D6S89, D6S109, D6S288 and 
D6S274, and AMIO-GA are dinucleotide repeat markers used in the genetic analysis 
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of SCAl families. The SCAl candidate region is flanked by the D6S274 and 
D6S89 markers which identify the closest recombination events. The YAC clones 
shown here are indicated by the cross-hatched markings. YAC 172B5 has two non- 
contiguous segments of DNA as indicated by the open bar for the non-6p segment. 
5 The YACs are designated according to St. Louis and CEPH libraries. The position 
of the cosmid contig (C) which contains the overlapping cosmids which are (CAG)n 
positive is indicated by a solid black bar. The overlap between the YACs was 
determined by long-range restriction analysis. Orientation is indicated as 
centromeric (Cen) and telomeric (Tel). 

Figure 10, Southern blot analysis of leukocyte DNA using the 3.36- 
kb EcoRI fragment which contains the repeat as a probe. Figure 10a: Taql- 
digested DNA from a TX-SCAl kindred. The unaffected spouse has a single 
fragment at 2830-bp. The affected individual with onset at 25 years of age has the 
2830-bp fragment as well as a 2930-bp fragment. The affected child with onset at 4 
15 years inherited the normal 2830-bp from her mother, and has a new fragment of 
3000-bp not seen in either parent. Figure 10b: Taql-digested DNA from 
individuals from a MN-SCAl kindred. The unaffected spouse and the unaffected 
sibling have a 2830-bp fragment. The two affected brothers have the 2830-bp 
fragment as well as an expanded fragment of 2900-bp in the sib with onset at 25 
20 years and 2970-bp in the sib v^th onset at 9 years. Figure 10c: BstNI-digested 
DNA from the TX-SCAl kindred. Lanes 1-3 are from the same kindred depicted in 
(A). The normal fragment size is 530-bp, in individuals with onset at 25-30 years 
(lanes 1 and 4) the fragment expands to 610-bp. In the individual with onset at 15 
years of age (lane 7) the fragment size is 640-bp, and in the individual with onset at 
25 4 years (lane 3) the fragment size is 680-bp. The DNA in lane 5 is from a 14 year 
- old child who is asymptomatic. 

Figure 11. Analysis of the PCR-amplified products containing the 
trinucleotide repeat tract in normal and SCAl individuals. The CAG-a/CAG-b 
primer pair was used in panel (a) whereas the Rep-l/Rep-2 primer pair was used in 
30 panel (b). The individuals in lanes 1, 2 and 3 in panel (a) are brothers. The range 
for the normal (NL) and expanded (EXP) (C AG)^ repeat units is indicated. 

Figure 12. A scatter plot for the age-at-onset in years versus the 
number of the (CAG)n repeat units is shown to demonstrate the correlation between 
the age-at-onset and the size of the expansion. A linear correlation coefficient of 
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-0.845 was obtained. In addition a curvilinear correlation coefficient was calculated 
given the non-linear pattern of the plot. The curvilinear correlation coefficient is 
-0.936. 

Figure 13. Schematic representation of the SCAl cDNA contig. A 
subset of overlapping phage cDNA clones (black bars) and 5'-RACE-PCR product 
(Rl) spanning 10.66 kb of the SCAl transcript is shown. cDNA clone 31-5 
contains the entire coding region for the SCAl gene product, ataxin-1. On top, a 
schematic shows the structure of the SCAl transcript; the sizes of the coding region 
(rectangle) as well as the 5'UTR and the 3'UTR (thin lines) are indicated. The 
position of the CAG repeat wdthin the coding region is also shown. An asterisk 
indicates the clones used as probes to screen the cDNA libraries. At the bottom the 
positions of BamHl (B), Hmdlll (H), and Tagl (T) restriction sites are shown. 

Figure 14. Northern blot analysis of the SCAl gene using RNAs 
from multiple human tissues. The panel on the left is probed with a PCR product 
15 from a portion of the coding region (bp 2460 to bp 3432). The panel on the right is 
hybridized with the 3 J cDNA clone from the 3'UTR. An -11 kb transcript is 
detected in RNAs from all tissues using both probes as well as the cDNA clones 
31-5 and 8-8, both of which contain the CAG repeat (Figure 13). 

Figure 15. The sequence of the SCAl transcript. The sequences of 
20 primers 9b, 5F and 5R (bp 129-147, bp 173-191 and bp 538-518 respectively in the 
5' to 3' orientation) are underlined. The protein sequence encoded by the DNA is 
shown below the DNA sequence. The CAG repeat region is from about bp 1524 to 
about bp 1613. 

Figure 16, a. The structure of the SCAl transcript and the various 
25 splice variants. The schematic on top represents the nine exons (not drawn to scale) 
and their respective sizes. The stippled areas indicate the coding region. The 
structure of five cDNA clones representing different splice variants of the SCAl 
transcript are also shown. Clones 8-8 and 8-9b are phage clones, RT-PCRl and 
RT-PCR2 are two clones obtained by RT-PCR carried out on cerebellar poly-(A)'^ 
30 RNA using the primers 9b and 5R (Figure 15). Only 30 bp of exon 1 were present 
in clone 8-9b and RT-PCR products as indicated by the broken line in the 
rectangles, b. Detection of alternative splicing of the SCAl transcript in cerebellar 
poly-(A)^ RNA (CBL RNA). RT-PCR analysis was carried out using two sets of 
primers: 9b-5R and 5F-5R. PCR products of the expected size were detected in 
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CBL RNA in the presence of reverse transcriptase (-i-RT) with both pairs of primers. 
Using the 9b-5R pair at least two larger PGR products were also detected. Using the 
5F-5R pair for RT-PCR at annealing T < 60°, some faint bands in the same size 
range as those seen using the 9b-5R primer pair were also seen. 8-8 and 8-9b are the 
5 phage clones used as positive controls. The sizes of the relevant bands of the 
molecular weight marker (FX 174 cut with Haelll) are indicated on the left. 

Figure 17. Intron-exon boundaries of the SCAl gene. Splice 
acceptor and splice donor sites are indicated in bold letters. The numbers at the 
beginning and the end of each exon refer to the position in the composite sequence 
10 of SCAl in Figure 15. Uppercase letters indicate exon sequences, lowercase letters 
indicate intron sequences. Y= pyrimidine; R= purine; N= undefined. 

Figure 18. Genomic structure of the SCAl gene. The nine exons of 
the SCAl gene (solid rectangles not drawn to scale) were localized based on the 
restriction map of the SCAl region by Southern analysis using rare cutter DNA 
15 digests from several YAC clones. A representative map using YAC clone 227B1, 
which encompasses the SCAl gene, is shown. The restriction map of this YAC has 
been confirmed by analysis of four overlapping YAC clones in the region. The 
centromere-telomere orientation is indicated by CEN-TEL, respectively. L= left 
YAC end; R= right YAC end; B= BssHll; C= Cspl; M= Mwl; N= Noil; Nr= Nrul; 
20 S= Saclh 

Figure 19. Analysis of expression of the expanded SCAl allele. 
RT-PCR was carried out on lymphoblast poly-(A)^RNA from one unaffected 
individual (lane 1) and four SCAl patients (lanes 2 through 5) using primers Repl 
and Rep2. This analysis shows that both the normal and the expanded SCAl alleles 
25 are transcribed. The number of the repeat units for each allele is indicated below 
each lane; lane 6 is the RT minus control. 

Figure 20. Distributions of CAG repeat lengths from unaffected 
control individuals and from SCAl alleles. Normal alleles range in size from 19 to 
36 repeat units while disease alleles contain from 42 to 81 repeats. 

30 
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Detailed Description 

Substantial efforts have been made to localize the SCAl gene using 
genetic and physical mapping methods. Genetically, SCAl is flanked on the 
centromeric side by D6S88 at a recombination fraction of approximately 0.08 
5 (based on marker-marker distances using the Centre d^Etude du Polymorphisme 
Humain (CEPH) reference families) and on the telomeric side by Fl 3 A at a 
recombination fraction of 0.19. See, L.P.W. Ranum et al.. Am. J. Hum. Genet.. 49, 
31-41 (1991). Both markers are quite distant and are not practical for use in efforts 
aimed at cloning the SCAl gene. The D6S89 marker maps closer to the SCAl 
10 gene. 

To localize SCAl more precisely, five dinucleotide polymorphisms 
near D6S89 have been identified. A new marker, AMIOGA, demonstrates no 
recombination with SCAl. Linkage analysis and analysis of recombination events 
confirm that SCAl maps centromeric to D6S89 with D6S109 as the other flanking 
15 marker at the centromeric end and establishes the following order: centromere- 
D6S109-AM10GA/SCAl-D6S89-LR40-D6S202-telomere. The genetic distance 
between the two flanking markers D6S109 and D6S89 is about 6.7 cM based on 
linkage analysis using 40 reference families from the Centre d'Etude du 
Polymorphisme Humain (CEPH). 

20 

A, SCAl Gene and Met hod of Diagnosis 

The size of the candidate region on the short arm of chromosome 6 
containing the SCAl locus is about 1.2 Mb, and is flanked by D6S274 to the 
centromeric side and D6S89 to the telomeric side. The SCAl gene spans 450 kb of 

25 genomic DNA and is organized in nine exons (Figure 15 is representative of the 
SCAl gene from a normal individual). The SCAl transcript (i.e., mRNA or cDNA 
clone) is about 10.6-11 kb. The gene is transcribed in both normal and affected 
SCAl alleles. The structure of the gene is unusual in that it contains seven exons in 
the 5 '-untranslated region, two large exons (2080 bp and 7805 bp) which contain a 

30 2448-bp coding region, and a 7277 bp 3 '-untranslated region. The first four non- 
coding exons undergo extensive alternative splicing in several tissues. 

The gene for SCAl contains a highly polymorphic CAG repeat that 
is located within a 3.36-kb fragment produced by digestion of the candidate region 
with the restriction enzyme, EcoRI. The CAG repeat region preferably lies within 
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the. coding region and codes for polyglutamine. This region of CAG repeating 
sequences is unstable and expanded in individuals with SCAl. Southern and PCR 
analyses of the (CAG)^ repeat demonstrate a correlation between the size of the 
repeat expansion and the age-at-onset of SCAl and severity of the disorder. That is, 
5 individuals with more repeat units (or longer repeat tracts) tend to have both an early 
age of onset and a more severe disease coarse. These results demonstrate that SCAl, 
like fragile X syndrome, myotonic dystrophy, X-linked spinobulbar muscular 
atrophy, and Huntington disease, displays a mutational mechanism involving 
expansion of an unstable trinucleotide repeat. 

10 The identification of a trinucleotide repeat expansion associated v^th 

SCAl allows for improved diagnosis of the disease. Thus, in addition to being 
directed to the gene for SCAl and the protein encoded thereby, the present invention 
also relates to methods of diagnosing SCAl . These diagnostic methods can involve 
any known method for detecting a specific fragment of DNA. These methods can 

15 include direct detection of the DNA or indirect through detection of RNA or 
proteins, for example. For example. Southern or Northern blotting hybridization 
techniques using labeled probes can be used. Alternatively, PCR techniques can be 
used with novel primers that amplify the CAG repeating region of the EcoRl 
fragment. Nucleic acid sequencing can also be used as a direct method of 

20 determining the nimiber of CAG repeats. 

For example, DNA probes can be used for identifying DNA 
segments of the affected allele of tlie SCAl gene. DNA probes are segments of 
labeled, single-stranded DNA which will hybridize, or noncovalently bind, with 
complementary single-stranded DNA derived from the gene sought to be identified. 

25 The probe can be labeled with any suitable label known to those skilled in the art, 
including radioactive and nonradioactive labels. Typical radioactive labels include 
32p^ i25j^ 35g^ jjj^g Nonradioactive labels include, for example, ligands such 

as biotin or digoxigenin as well as enzymes such as phosphatase or peroxidases, or 
the various chemiluminescers such as luciferin, or fluorescent compounds like 

30 fluorescein and its derivatives. The probe may also be labeled at both ends with 
different types of labels for ease of separation, as, for example, by using an isotopic 
label at one end and a biotin label at the other end. 

Using DNA probe analysis, the target DNA can be derived by the 
enzymatic digestion, fractionettion, and denaturation of genomic DNA to yield a 
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complex mixture incorporating the DNA from many different genes, including DNA 
from the short arm of chromosome 6, which includes the SCAl locus. A specific 
DNA gene probe will hybridize only with DNA derived from its target gene or gene 
fragment, and the resultant complex can be isolated and identified by techniques 
5 known in the art. 

In general, for detecting the presence of a DNA sequence located 
within the SCAl gene, the genomic DNA is digested with a restriction endonuclease 
to obtain DNA fragments. The source of genomic DNA to be tested can be any 
biological specimen that contains DNA. Examples include specimen of blood, 

10 semen, vaginal swabs, tissue, hair, and body fluids. The restriction endonuclease 
can be any that will cut the genomic DNA into fragments of double-stranded DNA 
having a particular nucleotide sequence. The specificities of numerous 
endonucleases are well known and can be found in a variety of publications, e.g. 
Maniatis et ah; Molecular Cloning: A Laboratory Manual : Cold Spring Harbor 

15 Laboratory: New York (1982). That manual is incorporated herein by reference in 
its entirety. Preferred restriction endonuclease enzymes include EcoRI^ Taql^ and 
jB5/NL £coRI is particularly preferred. 

Diagnosis of the disease can alternatively involve the use of the 
polymerase chain reaction sequence amplification method (PCR) using novel 

20 primers. U.S. Patent No. 4,683,195 (Mullis et al., issued July 28, 1987) describes a 
process for amplifying, detecting and/or cloning nucleic acid sequences. The 
method involves treating extracted DNA to form single-stranded complementary 
strands, treating the separate complementary strands of DNA with two 
oligonucleotide primers, extending the primers to form complementary extension 

25 products that act as templates for synthesizing the desired nucleic acid molecule; 
and detecting the amplified molecule. More specifically, the method steps of 
treating the DNA with primers and extending the primers include the steps of: 
adding a pair of oligonucleotide primers, wherein one primer of the pair is 
substantially complementary to part of the sequence in the sense strand and the other 

30 primer of each pair is substantially complementary to a different part of the same 
sequence in the complementary antisense strand; armealing the paired primers to the 
complementary molecule; simultaneously extending the annealed primers from a 3' 
terminus of each primer to synthesize an extension product complementary to the 
strands annealed to each primer wherein said extension products after separation 
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from the complement serve as templates for the synthesis of an extension product 
for the other primer of each pair; and separating said extension products from said 
templates to produce single-stranded molecules. Variations of the method are 
described in U.S. Patent No. 4,683,194 (Saiki et aL, issued July 28, 1987). The 

5 polymerase chain reaction sequence amplification method is also described by Saiki 
et ah. Science . 230 , 1350-1354 (1985) and Scharf et al., Scknce, 224, 163-166 
(1986). The discussion of the these techniques in each of these references is 
incorporated herein by reference. 

The primers are oligonucleotides, either synthetic or naturally 

10 occurring, capable of acting as a point of initiating synthesis of a product 
complementary to the region of the DNA sequence containing the GAG repeating 
trinucleotides of the SGAl locus of the short arm of chromosome 6. The primer 
includes a nucleotide sequence substantially complementary to a portion . of a strand 
of an affected or a normal allele of a fragment (preferably a 3.36 kb EcoRl 

15 fragment) of an SGAl gene having a (CAG)^ region. The primer sequence has at 
least about 1 1 nucleotides, preferably at least about 16 nucleotides and no more than 
about 35 nucleotides. The primers are chosen such that they produce a primed 
product of about 70-350 base pairs, preferably about 100-300 base pairs. More 
preferably, the primers are chosen such that nucleotide sequence is substantially 

20 complementary to a portion of a strand of an affected or a normal allele within about 
150 nucleotides on either side of the (GAG),, region, including directly adjacent to 
the (GAG)n region. 

Examples of preferred primers are shown by solid lines with 
arrowheads in Figure 3. The primers are thus selected from the group consisting of 

25 GGGGAGGGGTGGTGAGGT (GAG-a), GGAGAGGGGGGGAGAC (GAG-b), 
AAGTGGAAATGTGGAGGTAG (Rep-1), GAAGATGGGGAGTGTGAG (Rep.2), 
GGAGGAGTGGATGGGAGG (GGT-435), TGGTGGGGTGGTGGGGGG (GCT- 
214), GTGTGGGGTTTGTTGGTG (Pre-l), and GTAGGTGGAGATTTGGAGTT 
(Pre-2). These primers can be used in various combinations or with any other 

30 primer that can be designed to hybridize to a portion of DNA of a fragment 
(preferably a 3.36 kb EcoVJ fragment) of an SGAl gene having a GAG repeat 
region. For example, the primer labeled Rep-2 can be combined with the primer 
labeled GAG-a, and the primer labeled GAG-b can be combined with the primer 
labeled Rep-L More preferably the primers are the sets of primer pairs designed as 
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CAG-a/CAG-b, Rep-l/Rep-2, Rep-l/GCT-435, for example. These primer sets 
successfully amplify the CAG repeat units of interest using PGR technology. 
Alternatively, they can be used in various known techniques to sequence the SCAl 
gene. 

5 As stated previously, other methods of diagnosis can be used as well. 

They can be based on the isolation and identification of the repeat region of genomic 
DNA (CAG repeat region), cDNA (CAG repeat region), mRNA (GUC repeat 
region), and protein products (glutamine repeat region). These include, for example, 
using a variety of electrophoresis techniques to detect slight changes in the 

10 nucleotide sequence of the SCAl gene. Further nonlimiting examples include 
denaturing gradient electrophoresis, single strand conformational polymorphism 
gels, and nondenaturing gel electrophoresis techniques. 

The mapping and cloning of the SCAl gene allows the definitive 
diagnosis of one type of the dominantly inherited ataxias using a simple blood test. 

15 This represents the first step towards an unequivocal molecular classification of the 
dominant ataxias. A simple and reliable classification system for the ataxias is 
important because the clinical symptoms overlap extensively between the SCAl and 
the non-SCAl forms of the disease. Furthermore, a molecular test for the only 
known SCAl mutation permits presymptomatic diagnosis of disease in known 

20 SCAl families and allows for the identification of sporadic or isolated CAG repeat 
expansions where there is no family history of the disease. Thus, the present 
invention can be used in family counseling, planning medical treatment, and in 
standard work-ups of patients with ataxia of unknown etiology. 

25 a Cloning 

Cloning of SCAl DNA into the appropriate replicable vectors allows 
expression of the gene product, ataxin-1, and makes the SCAl gene available for 
further genetic engineering. Expression of ataxin-1 or portions thereof, is useful 
because these gene products can be used as antigens to produce antibodies, as 

30 described in more detail below. 

K l$QlationofPNA 

DNA containing the SCAl gene may be obtained from any cDNA 
library prepared from tissue believed to possess the SCAl mRNA and to express it 
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at a detectable level. Preferably, the cDNA library is from human fetal brain or 
adult cerebellum. Optionally, the SCAl gene may be obtained from a genomic 
DNA library or by in vitro oligonucleotide synthesis from the complete nucleotide 
or amino acid sequence. 
5 Libraries are screened with appropriate probes designed to identify 

the gene of interest or the protein encoded by it. Preferably, for cDNA libraries, 
suitable probes include oligonucleotides that consist of known or suspected portions 
of the SCAl cDNA from the same or different species; and/or complementary or 
homologous cDNAs or fragments thereof that consist of the same or a similar gene. 
10 Optionally, for cDNA expression libraries (which express the protein), suitable 
probes include monoclonal or polyclonal antibodies that recognize and specifically 
bmd to the SCAl gene product, ataxin-1. Appropriate probes for screening 
genomic DNA libraries include, but are not limited to, oligonucleotides, cDNAs, or 
fragments thereof that consist of the same or a similar gene, and/or homologous 
15 genomic DNAs or fragments thereof. Screening the cDNA or genomic library with 
the selected probe may be accomplished using standard procedures. 

Screening cDNA libraries using synthetic oligonucleotides as probes 
is a preferred method of practicing this invention. The oligonucleotide sequences 
selected as probes should be of sufficient length and sufficiently unambiguous to 
20 minimize false positives. The actual nucleotide sequence(s) of the probe(s) is 
usually designed based on regions of the SCAl gene that have the least codon 
redundancy. The oligonucleotides may be degenerate at one or more positions, i.e., 
two or more different nucleotides may be incorporated into an oligonucleotide at a 
given position, resulting in multiple synthetic oligonucleotides. The use of 
25 degenerate oligonucleotides is of particular importance where a library is screened 
from a species in which preferential codon usage is not known. 

The oligonucleotide can be labeled such that it can be detected upon 
hybridization to DNA in the library being screened. A preferred method of labeling 
is to use ATP and polynucleotide kinase to radiolabel the 5' end of the 
oligonucleotide. However, other methods may be used to label the oligonucleotide, 
including, but not limited to, biotinylation or enzyme labeling. 

Of particular interest is the SCAl nucleic acid that encodes a full- 
length mRNA transcript, including the complete coding region for the gene product. 



30 
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ataxin-1. Nucleic acid containing the complete coding region can be obtained by 
screening selected cDNA libraries using the deduced amino acid sequence. 

An alternative means to isolate the SCAl gene is to use PGR 
methodology. This method requires the use of oligonucleotide primer probes that 
5 will hybridize to the SCAl gene. Strategies for selection of PGR primer 
oligonucleotides are described below. 

2. Ingprtipn of DNA intQ VpgtQr 

The nucleic acid (e.g., cDNA or genomic DNA) containing the SGAl 

10 gene is preferably inserted into a replicable vector for further cloning (amplification 
of the DNA) or for expression of the gene product, ataxin-1. Many vectors are 
available, and selection of the appropriate vector will depend on: 1) whether it is to 
be used for DNA amplification or for DNA expression; 2) the size of the nucleic 
acid to be inserted into the vector; and 3) the host cell to be transformed with the 

15 vector. Most expression vectors are "shuttle" vectors, i.e., they are capable of 
replication in at least one class of organism but can be transfected into another 
organism for expression. For example, a vector is cloned in E, coli and then the 
same vector is transfected into yeast or mammalian cells for expression even though 
it is not capable of replicating independently of the host cell chromosome. Each 

20 replicable vector contains various structural components depending on its function 
(amplification of DNA or expression of DNA) and the host cell with which it is 
compatible. These components are described in detail below. 

Gonstruction of suitable vectors employs standard ligation techniques 
known in the art. Isolated plasmids or DNA fragments are cleaved, tailored, and 

25 relegated in the form desired to generate the plasmids required. Typically, the 
ligation mixtures are used to transform E. coli K12 strain 294 (ATCC 31,446) and 
successful transformants are selected by ampicillin or tetracycline resistance where 
appropriate. Plasmids from the transformants are prepared, analyzed by restriction 
endonuclease digestion, and/or sequenced by methods known in the art. See, e.g., 

30 Messing et al., Nucl. Acids Res.. 2, 309 (1981) and Maxam et a!.. Methods in 
Enzvmologv . 61, 499 (1980). 

Optionally, DNA may also be amplified by direct insertion into the 
host genome. This is readily accomplished using Bacillus species as hosts, for 
example, by including in the vector a DNA sequence that is complementary to a 
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10 



sequence found in Bacillus genomic DNA. Transfection of Bacillus with this vector 
results in homologous recombination with the genome and insertion of SCAl DNA. 
However, the recovery of genomic DNA containing the SCAl gene is more 
complex than that of an exogenously replicated vector because restriction enzyme 
digestion is required to excise the SCAl DNA, 

Replicable cloning and expression vector components generally 
include, but are not limited to, one or more of the following: a signal sequence, an 
origin of replication, one or more marker genes, an enhancer element, a promoter 
and a transcription termination sequence. 

Vector component: signal sequence. A signal sequence may be used 
to facilitate extracellular transport of a cloned protein. To this end, the SCAl gene 
product, ataxin-1, may be expressed not only directly, but also as a fusion product 
with a heterologous polypeptide, preferably a signal sequence or other polypeptide 
having a specific cleavage site at the N-terminus of the cloned protein or 
15 polypeptide. The signal sequence may be a component of the vector, or it may be a 
part of the SCAl DNA that is inserted into the vector. The heterologous signal 
sequence selected should be one that is recognized and processed (i.e., cleaved by a 
signal peptidase) by the host cell. For prokaryotic host cells, a prokaryotic signal 
sequence may be selected, for example, from the group of the alkaline phosphatase, 
20 penicillinase, Ipp or heat-stable intertoxin II leaders. For yeast secretion the signal 
sequence used may be, for example, the yeast invertase, alpha factor, or acid 
phosphatase leaders. In mammalian cell expression, a native signal sequence may 
be satisfactory, although other mammalian signal sequences may be suitable, such 
as signal sequences from secreted polypeptides of the same or related species, as 
25 well as viral secretory leaders, for example, the herpes simplex gD signal. 

Vector component: origin of replication. Both expression and 
cloning vectors contain a nucleic acid sequence that enables the vector to replicate in 
one or more selected host cells. Generally, in cloning vectors this sequence is one 
that enables the vector to replicate independently of the host chromosomal DNA, 
30 and includes origins of replication or autonomously replicating sequences. Such 
sequences are well known for a variety of bacteria, yeast and viruses. The origin of 
replication from the plasmid pBR322 is suitable for most Gram-negative bacteria, 
the 2m plasmid origin is suitable for yeast, and various viral origins (SV40, 
polyoma, adenovirus, VSV or BPV) are usefiil for cloning vectors in mammalian 
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cells. Generally, the origin of replication component is not needed for mammalian 
' expression vectors (the SV40 origin may typically be used only because it contains 
the early promoter). 

Vector component: marker gene. Expression and cloning vectors 
5 may contain a marker gene, also termed a selection gene or selectable marker. This 
gene encodes a protein necessary for the survival or growth of transformed host cells 
grown in a selective culture medium. Host cells not transformed with the vector 
containing the selection gene will not survive in the culture medium. Typical 
selection genes encode proteins that: (a) confer resistance to antibiotics or other 

10 toxins, e.g., ampicillin, neomycin, methotrexate, streptomycin or tetracycline; (b) 
complement auxotrophic deficiencies; or (c) supply critical nutrients not available 
from complex media, e.g., the gene encoding D-alanine racemase for Bacilli. One 
example of a selection scheme utilizes a drug to arrest growth of a host cell. Those 
cells that are successfully transformed with a heterologous gene express a protein 

15 conferring drug resistance and thus survive the selection regimen. 

An example of suitable selectable markers for mammalian cells are 
those that enable the identification of cells competent to take up the SCAl nucleic 
acid, such as dihydrofolate reductase (DHFR) or thymidine kinase. The mammalian 
cell transformants are placed under selection pressure that only transformants are 

20 uniquely adapted to survive by virtue of having taken up the marker. For example, 
cells transfomied with the DHFR selection gene are first identified by culturing all 
the transformants in a culture medium that contains methotrexate, a competitive 
antagonist for DHFR. An appropriate host cell when wild-type DHFR is employed 
is the Chinese hamster ovary (CHO) cell line deficient in DHFR activity, prepared 

25 and propagated as described by Urlaub et al., Proc. Natl. Acad. Sci TJS;a , 72, 42 1 6 
(1980). The transformed cells are then exposed to increased levels of methotrexate. 
This leads to the synthesis of multiple copies of the DHFR gene, and, 
concomitantly, multiple copies of the other DNA comprising the expression vectors, 
such as the SCAl gene. This amplification technique can be used with any 

30 otherwise suitable host, e.g., ATCC No. CCL61 CHO-Kl, notwithstanding the 
presence of endogenous DHFR if, for example, a mutant DHFR gene that is highly 
resistant to methotrexate is employed. Alternatively, host cells (particularly wild- 
type hosts that contain endogenous DHFR) transformed or co-transformed with 
SCAl DNA, wild-type DHFR protein, and another selectable marker such as 
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aminoglycoside 3' phosphotransferase (APH) can be selected by cell growth in a 
medium containing a selection agent for the selectable marker such as an 
aminoglycosidic antibiotic, e.g., kanamycin or neomycin. A suitable selection gene 
for use in yeast is the trpl gene present in the yeast plasmid YRp7 (Stinchcomb et 
5 al.. Nature . 282 , 39 (1979); Kingsman et al., Gene . Z 141 (1979); or Tschemper et 
aL, Gene . 10, 157 (1980)). The trpl gene provides a selection marker for a mutant 
strain of yeast lacking the ability to grow in tryptophan, for example, ATCC NO. 
44076 or PEP4.1 (Jones, Genetics . S5, 12 (1977)). The presence of the trpl lesion - 
in the yeast host cell genome then provides an effective environment for detecting 

10 transformation by grov^h in the absence of tryptophan. Similarly, Leu2 deficient 
yeast strains (ATCC 20,622 or 38,626) are complemented by known plasmids 
bearing the Leu2 gene. 

Vector component: promoter. Expression and cloning vectors 
usually contain a promoter that is recognized by the host organism and is operably 

15 linked to the SCAl nucleic acid. Promoters are untranslated sequences located 
upstream (5') to the start codon of a structural gene (generally within about 100 to 
1000 bp) that control the transcription and translation of a particular nucleic acid 
sequence, such as the ataxin-1 nucleic acid sequence, to which they are operably 
linked. Such promoters typically fall into two classes, inducible and constitutive. 

20 Inducible promoters are promoters that initiate increased levels of transcription from 
DNA under their control in response to some change in culture conditions, e.g., the 
presence or absence of a nutrient or a change in temperature. In contrast, 
constitutive promoters produce a constant level of transcription of the cloned DNA 
segment. 

25 At this time a large number of promoters recognized by a variety of 

f ; potential host cells are well known in the art. Promoters are removed from their 
source DNA using a restriction enzyme digestion and inserted into the cloning 
vector using standard molecular biology techniques. Both the native SCAl 
promoter sequence and many heterologous promoters can be used to direct 

30 amplification and/or expression of the SCAl DNA. Heterologous promoters are 
preferred, as they generally permit greater transcription and higher yields of 
expressed protein as compared to the native promoter. Well-knovm promoters 
suitable for use with prokaryotic hosts include the beta-lactamase and lactose 
promoter systems, alkaline phosphatase, a tryptophan (trp) promoter system, and 
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hybrid promoters such as the tac promoter. Such promoters C£in be ligated to SCAl 
DNA using linkers or adapters to supply any required restriction sites. Promoters 
for use in bacterial systems may contain a Shine-Dalgamo sequence for RNA 
polymerase binding. 

5 Promoter sequences are known for eukaryotes. Virtually all 

eukaryotic genes have an AT-rich region located approximately 25 to 30 bp 
upstream from the site where transcription is initiated Another sequence foimd 70 
to 80 bases upstream from the start of transcription of many genes is the CXCAAT 
region where X may be any nucleotide. At the 3' end of most eukaryotic genes is an 

10 AATAAA sequence that may be a signal for addition of the poly A tail to the 3' end 
of the coding sequence. All these sequences are suitably inserted into eukaryotic 
expression vectors. Examples of suitable promoting sequences for use with yeast 
hosts include the promoters for 3-phosphoglycerate kinase or other glycolytic 
enzymes, such as enolase, glyceraldehyde-3 -phosphate dehydrogenase, hexokinase, 

15 pyruvate decarboxylase, phosphofructokinase, glucose-6-phosphate isomerase, 3- 
phosphoglycerate mutase, pyruvate kinase, triosephosphate isomerase, 
phosphoglucose isomerase and glucokinase. Other yeast promoters, which are 
inducible promoters having the additional advantage of transcription controlled by 
grovs^h conditions, are the promoter regions for alcohol dehydrogenase 2, 

20 isocytochrome C, acid phosphatase, degradative enzymes associated with nitrogen 
metabolism, metallothionein, glyceraldehyde-3-phosphate dehydrogenase, and 
enzymes responsible for maltose and galactose utilization. 

SCAl transcription from vectors in mammalian host cells can be 
controlled, for example, by promoters obtained from the genomes of viruses such as 

25 polyoma virus, fowlpox virus, adenovirus (such as Adenovirus 2), bovine papilloma 
virus, avian sarcoma virus, cytomegalovirus, a retrovirus, Hepatitis-B virus and 
most preferably Simian Virus 40 (SV40) (Fiers et al.. Nature . 273 . 113 (1978); 
Mulligan et ah, Sdeose, 202, 1422-1427 (1980); Pavlakis et aL. Proc. Natl Arad 
Spi, USA, 2S, 7398-7402 (1981)). Heterologous mammalian promoters (e.g., the 

30 actin promoter or an immunoglobulin promoter) and heat-shock promoters can also 
be used, as can the promoter normally associated v^ith the SCAl sequence itself, 
provided such promoters are compatible with the host cell systems. 

Vector component: enhancer element. Transcription of SCAl DNA 
by higher eukaryotes can be increased by inserting an enhancer sequence into the 
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vector. Enhancers are c/^-acting elements of DNA, usually having about 10 to 300 
bp, that act on a promoter to increase its transcription. Enhancers are relatively 
orientation- and position-independent, having been found 5' and 3' to the 
transcription unit, within an intron as well as within the coding sequence itself. 

5 Many enhancer sequences are now knovra from mammalian genes (globin, elastase, 
albumin, alpha-fetoprotein, and insulin). Typically, however, an enhancer from a 
eukaryotic cell virus will be used. Examples include the SV40 enhancer on the late 
side of the replication origin, the cytomegalovirus early promoter enhancer, the 
polyoma enhancer on the late side of the replication origin, and adenovirus 

10 enhancers. The enhancer may be spliced into the vector at a position 5' or 3' to the 
SCAl gene, but is preferably located at a site 5' of the promoter. 

Vector component: transcription termination. Expression vectors 
used in eukaryotic host cells (yeast, fungi, insect, plant, animal, human or nucleated 
cells from other multicellular organisms) can also contain sequences necessary for 

15 the termination of transcription and for stabilizing the mRNA. Such sequences are 
commonly available from the 5' and, occasionally, 3' untranslated regions of 
eukaryotic or viral DNAs or cDNAs. These regions can contain nucleotide 
segments transcribed as polyadenylated fragments in the untranslated portion of 
mRNA encoding ataxin- 1 . 

20 Preferably, the pMAL^^-2 vectors (New England Biolabs, Beverly, 

MA) are used to create the expression vector. These vectors provide a convenient 
method for expressing and purifying ataxin- 1 produced from the cloned SCAl gene. 
The SCAl gene is inserted downstream from the malE gene of E, coli, which 
encodes maltose-binding protein (MBP) resulting in the expression of an MBP 

25 fiision protein* The method uses the strong "tac" promoter and the malE translation 
initiation signals to give high-level expression of the cloned sequences, and a one- 
step purification of the fiision protein using MBP's affinity for maltose. The vectors 
express the malE gene (with or without its signal sequence) fused to the lacZa gene. 
Restriction sites between malE and lacZa are available for inserting the coding 

30 sequence of interest. Insertion inactivates the P-galactosidase a-fragment activity of 
the malE'lacZcL fiision, which results in a blue to white color change on Xgal plates 
when the construction is transformed into an a-complementing host such as TBI 
(T.C. Johnston et al. J. RioK Chem. . m 4805-4811 (1986)) or JM107 (C. Yanisch- 
Perron et al., Gene > 22, 103-1 19 (1985)). When present, the signal peptide on pre- 
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MBP directs fusion proteins to the periplasm. For fusion proteins that can be 
successfully exported, this allows folding and disulfide bond formation to take place 
in the periplasm of E. colU as well as allowing purification of the protein from the 
periplasm. The vectors carry the lac^ gene, which codes for the Lac repressor 

5 protein. This keeps expression from Piac low in the absence of isopropyl P-D- 
thiogalactopyranoside (IPTG) induction. The pMAL'^^-2 vectors also contain the 
sequence coding for the recognition site of the specific protease factor Xa, located 
just 5' to the poly linker insertion sites. This allows MBP to be cleaved from ataxin- 
1 after purification. Factor Xa cleaves after its four amino acid recognition 

10 sequence, so that few or no vector derived residues are attached to the protein of 
interest, depending on the site used for cloning. 

Also useful are expression vectors that provide for transient 
expression in mammalian cells of SCAl DNA, In general, transient expression 
involves the use of an expression vector that is able to replicate efficiently in a host 

15 cell, such that the host cell accumulates many copies of the expression vector and, in 
turn, synthesizes high levels of a desired polypeptide encoded by the expression 
vector. Transient expression systems, comprising a suitable expression vector and a 
host cell, allow for the convenient positive identification of polypeptides encoded by 
cloned DNAs, as well as for the rapid screening of such polypeptides for desired 

20 biological or physiological properties. Thus, transient expression systems are 
particularly useful in the invention for purposes of identifying analogs and variants 
of ataxin-1 that have wild-type or variant biological activity. 

25 Suitable host cells for cloning or expressing the vectors herein are the 

prokaryote, yeast, or higher eukaryotic cells described above. Suitable prokaryotes 
include eubacteria, such as Gram-negative or Gram-positive organisms, for 
example, E, coli, Bacilli such as B, subtilis, Pseudomqnas species such as P. 
aeruginosa. Salmonella typhimurium, or Serratia marcsecans. One preferred E. coli 

30 cloning host is E. coli 294 (ATCC 31,446), although other strains such as E. coli B, 
E. coli XI 776 (ATCC 31,537), and E, coli W3110 (ATCC 27,325) are suitable. 
These examples are illustrative rather than limiting. Preferably the host cell should 
secrete minimal amounts of proteolytic en2ymes. Alternatively, in vitro methods of 
cloning, e.g., PCR or other nucleic acid polymerase reactions, are suitable. 
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In addition to prokaryotes, eukaryotic microbes such as filamentous 
fungi or yeast are suitable hosts for SCAl -encoding vectors. Saccaromyces 
cerevisiae, or common baker's yeast, is the most commonly used among lower 
eukaryotic host microorganisms. However, a number of other genera, species, and 
5 strains are commonly available and useful herein, such as Schizosaccaromyces 
pombe, Kluyveromyces hosts such as, e.g., K, lactis, K. fragilis, K bulgaricus, K. 
thermotolerans, and K. marxianus, yarrowia, Pichia pastoris, Candida, 
Trichoderma reesia, Neurospora crassa, and filamentous fungi such as, e.g., 
Neurospora, Penicillhim, Tolypocladium, and Aspergillus hosts such as A, nidulans, 
-10 Suitable host cells for the expression of glycosylated ataxin-1 are 

derived firom multicellular organisms. Such host cells are capable of complex 
processing and giycosylation activities. In principle, any higher eukaryotic cell 
culture is workable, whether from vertebrate or invertebrate culture. Examples of 
invertebrate cells include plant and insect cells. Numerous baculoviral strains and 
15 variants and corresponding permissive insect host cells fi-om hosts such as 
Spodoptera frugiperda (caterpillar), Aedes aegypti (mosquito), Aedes albopictus 
(mosquito), Drosophila melanogaster (fiiiitfly), and Bombyx rhori have been 
identified. See, e,g, Luckow et al., Bio/Technologv . 6, 47-55 (1988); Miller et al.. 
Genetic Engineering . S, 277-279 (1986); and Maeda et al.. Nature . 115, 592-594 
20 (1985). A variety of viral strains for transfection are publicly available, e.g., the L-1 
variant of Autographa californica NPV and the Bm-5 strain of Bombyx mori NPV, 
and such viruses may be used as the virus herein according to the present invention, 
particularly for transfection of Spodoptera frugiperda cells. 

Plant cell cultures of cotton, com, potato, soybean, petunia, tomato, 
^^25 and tobacco can be utilized as hosts. Typically, plant cells are transfected by 
X incubation with certain strains of the bacteriiun Agrobacterium tumefaciens^ which 
has been previously manipulated to contain the SCAl DNA. During incubation of 
the plant cell culture with A, tumefacienSy the SCAl DNA is transferred to the plant 
cell host such that it is transfected, and will, under appropriate conditions, express 
30 the SCAl DNA. In addition, regulatory and signal sequences compatible with plant 
cells are available, such as the nopaline synthase promoter and polyadenylation 
signal sequences. Depicker et al., J. Mol. Appl. Gen. . 1, 561 (1982). 

Vertebrate cells can also be used as hosts. Propagation of vertebrate 
cells in culture (tissue culture) has become a routine procedure in recent years. 
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Examples of useful mammalian host cell lines are monkey kidney CVl line 
transfomied by SV40 (CAS-7, ATCC CRL 1651); human embryonic kidney line 
(293 or 293 cells subcloned for growth in suspension culture, Graham et aL, J. Gen. 
Virol. , 16, 59 (1977)); baby hamster kidney cells (BHK, ATCC CCL 10); Chinese 

5 hamster ovary cellsADHFR (CHO, Urlaub and Chasin, Proc. Natl. Acad. Sci. USA . 
27, 4216 (1980)); mouse Sertoli cells (TM4, Mather, Biol Reprod. . 22, 243-251 
(1980)); monkey kidney cells (CVl ATCC CCL 70); African green monkey kidney 
cells (VERO-76, ATCC CRL-1587); human cervical carcinoma cells (HELA, 
ATCC CCL 2); canine kidney cells (MDCK, ATCC CCL 34); buffalo rat liver cells 

10 (BRL 3A, ATCC CRL 1442); human lung cells (W138, ATCC CCL 75); human 
liver cells (Hep G2, HB 8065); mouse mammary tumor (MMT 060562, ATCC CCL 
51); TRI cells (Mather et al.. Annals N.Y. Acad. Sci. . m 44-68 (1982)); MRC 5 
cells; FS4 cells; and a human hepatoma line (Hep G2). 

15 4. Transfec tion and transformation 

Host cells are transfected and preferably transformed with the above- 
described expression or cloning vectors of this invention and cultured in 
conventional nutrient media modified as appropriate for inducing promoters, 
selecting transformants, or amplifying the genes encoding the desired sequences. 

20 Transfection refers to the taking up of an expression vector by a host 

cell whether or not any coding sequence are in fact expressed. Numerous methods 
of transfection are known to the ordinarily skilled artisan, for example, the calcium 
phosphate precipitation method and electroporation are commonly used. Successful 
transfection is generally recognized when any indication of the operation of the 

25 vector occurs within the host cell. 

Transformation means introducing DNA into an organism so that the 
DNA is replicable, either as an extrachromosomal element or by chromosomal 
integrant. Depending on the host cell used, transformation is done using standard 
techniques appropriate to such cells. Calcium chloride is generally used for 

30 prokaryotes or other cells that contain substantial cell-wall barriers. Infection with 
Agrobacterium tumefaciens can be used for transformation of certain plant cells. 
For mammalian cells without cell walls, the calcium phosphate precipitation method 
of Graham et aL, Virologv . 12, 456-457 (1978) is preferred. Transformations into 
yeast are typically carried out according to the method of Van Solingen et a!., L 
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B^pt., no, 946 (1977) and Hsiao et aL, Proc. Natl. Acad. Sci. nJ?^AV 28 3829 
(1979). However, other methods for introducing DNA into cells such as by nuclear 
injection, electroporation, or protoplast fusion may also be used. 

5 5. Cell Culture 

Prokaryotic cells used to produce the SCAl gene product, ataxin-1, 
are cultured in suitable media, as described generally in Sambrook et al. The 
mammalian host cells used to produce the SCAI gene product may be cultured in a - 
variety of media. Commercially available media such as Hams FIO (Sigma), 

10 Minimal Essential Medium (MEM, Sigma), RPMI-1640 (Sigma), and Dulbecco's 
Modified Eagle's Medium (DMEM, Sigma) are suitable for culturing the host cells. 
These media may be supplemented as necessary with hormones and/or other growth 
factors (such as insulin, transferrin, or epidermal growth factor), salts (such as 
sodium chloride, calcium, magnesium, and phosphate), buffers (such as HEPES), 

15 nucleosides (such as adenosine and thymidine), antibiotics (such as Gentamycin™ 
drug), trace elements (defined as inorganic compounds usually present at final 
concentrations in the micromolar range), and glucose or an equivalent energy 
source. Any other necessary supplements may also be included at appropriate 
concentrations that would be known to those skilled in the art. The culture 

20 conditions, such as temperature, pH, and the like, are those previously used with the 
host cell selected for expression, and will be apparent to the ordinarily skilled 
artisan. The host cells referred to in this disclosure encompass in in vitro culture as 
well as cells that are within a host animal. 

25 C rvQUm 

The SCAl gene encodes a novel protein, ataxin-1, a representative 
example of which is shown in Figure 15 with an estimated molecular weight of 
about 87 kD, It is to be understood that ataxin-1 represents a set of proteins 
produced from the SCAl gene v^th its unstable CAG region, Ataxin-1 can be 

30 produced from cell cultures. With the aid of recombinant DNA techniques, 
synthetic DNA and cDNA coding for ataxin-I can be introduced into 
microorganisms which can then be made to produce the peptide. It is also possible 
to manufacture ataxin-1 synthetically, in a manner such as is known for peptide 
syntheses. 



wo 95/01437 



PCT/US94/07336 



-28- 

Ataxin-1 is preferably recovered from the culture medium as a 
cytosolic polypeptide, although it can also be recovered as a secreted polypeptide 
when expressed with a secretory signal. 

Ataxin-1 can be purified from recombinant cell proteins or 
5 polypeptides to obtain preparations that are substantially homogenous as ataxin-1. 
As a first step, the culture medixmi or lysate is centrifuged to remove particulate cell 
debris. The membrane and soluble protein fractions are then separated. The ataxin- 
1 may then be purified from the soluble protein fraction and from the membrane 
fraction of the culture lysate, depending on whether the ataxin-1 is membrane 

10 bound. If necessary, ataxin-1 ia further purified from contaminant soluble proteins 
and polypeptides, with the following procedures being exemplary of suitable 
purification procedures: by fractionation on immunoaffinity or ion-exchange 
columns; ethanol precipitation; reverse phase HPLC; chromatography on silica or on 
a cation-exchange resin such as DEAE; chromatofocusing; SDS-PAGE; ammonium 

15 sulfate precipitation; gel filtration using, for example, Sephadex G-75; ligand 
affinity chromatography, using, e.g., protein A Sepharose columns to remove 
contaminants such as IgG. 

Ataxin-1 variants in which residues have been deleted, inserted, or 
substituted are recovered in the same fashion as native ataxin-1, taking account of 

20 any substantial changes in properties occasioned by the variation. For example, 
preparation of a ataxin-1 fusion with another protein or polypeptide, e.g., a bacterial 
or viral antigen, facilitates purification; an immunoaffinity colxmin containing 
antibody to the antigen can be used to adsorb the ftision polypeptide. 
Immunoaffinity columns such as a rabbit polyclonal ataxin-1 colunm can be 

25 employed to absorb the ataxin-1 variant by binding it to at least one remaining 
inmiune epitope. Alternatively, the ataxin-1 may be purified by affinity 
chromatography using a purified ataxin-1 -IgG coupled to a (preferably) immobilized 
resin such as Affi-Gel 10 (Bio-Rad, Richmond, CA) or the like, by means well- 
known in the ait, A protease inhibitor such as phenyl methyl sulfonyl fluoride 

30 (PMSF) also may be usefiil to inhibit proteolytic degradation during purification, 
and antibiotics may be included to prevent the growth of adventitious contaminants. 

Govalent modifications of ataxin-1 are included within the scope of 
this invention. Both native ataxin-1 and amino acid sequence variants of the ataxin- 
1 may be covalently modified. Govalent modifications included within the scope of 
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this invention are those producing one or more ataxin-1 fragments. Ataxin-1 
fragments having any number of amino acid residues may be conveniently prepared 
by chemical synthesis, by enzymatic or chemical cleavage of the full-length or 
variant ataxin-1 polypeptide, or by cloning and expressing only portions of the 
5 SCAl gene. Other types of covalent modifications of ataxin-1 or fragments thereof 
are introduced into the molecule by reacting targeted amino acid residues of the 
ataxin-1 or fragments thereof with a derivatizing agent capable of reacting with 
selected side chains or the N- or C-terminal residues. 

For example, cysteinyl residues most commonly are reacted with a- 

10 haloacetates (and corresponding amines), such as iodoacetic acid or iodoacetamide, 
to give carboxymethyl or carboxyamidomethyl derivatives. Cysteinyl residues also 
are derivatized by reaction with bromotrifluoroacetone, a-bromo-p-(5- 
imidozoyOpropionic acid, iodoacetyl phosphate, N-alkylmaleimides, 3-nitro-2- 
pyridyl disulfide, methyl 2-pyridyl disulfide, /7-chloromercuribenzoate, 2- 

i 5 chloromercuri-4-nitrophenol, or chloro-7-nitrobenzo-2-oxa- 1 ,3-diazole. 

Histidyl residues are derivatized by reaction with 
diethylpyrocarbonate /7-bromophenacyl. Lysinyl and amino terminal residues are 
derivatized with succinic or other carboxylic acid anhydrides and imidoesters such 
as methyl picolinimidate; pyridoxal phosphate; pyridoxal; chloroborohydride; 

20 trinitrobenzenesulfonic acid; O-methylisourea; 2,4-pentanedione; and transaminase- 
catalyzed reaction with glyoxylate. Arginyl residues are modified by reaction with 
phenylglyoxal, 2,3-butanedione, 1 ,2-cyclohexanedione, and ninhydrin, among 
others. 

Specific modification of tyrosyl residues may be made, vAih 
25 particular interest in introducing spectral labels into tyrosyl residues by reaction 
with aromatic diazonium compounds or tetranitromethane. Most commonly, N- 
acetylimidizdle and tetranitromethane are used to form O-acetyl tyrosyl species and 
3-nitro derivatives, respectively. Tyrosyl residues are iodinated using '^^I or ''^^I to 
prepared labeled proteins for use in radioimmunoassay, the chloramine T method 
30 described iabove being suitable. 

Carboxyl side groups (aspartyl or glutamyl) are selectively modified 
by reaction with carbodiimides (R-N=C=N-R'), where R and R' are different alkyl 
groups, such as l-cyclohexyl-3-(2-morpholinyl-4-ethyl)carbodiimide or l-ethyl-3- 
(4-azonia-4,4-dimethylpentyl)carbodiimide. Furthermore, aspartyl and glutamyl 
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residues are converted to asparaginyl and glutaminyl residues by reaction with 
ammonium ions. 

Derivatization with bifunctional agents is useful for crosslinking 
ataxin-1 to a water-insoluble support matrix or surface for use in the method for 
5 purifying anti -ataxin-1 antibodies, and vice versa. Commonly used crosslinking 
agents include, e.^., l,l-bis(diazoacetyl)-2-phenyIethane, glutaraldehyde, and N- 
hydroxysuccinimide esters, for example, esters with 4-azidosalicylic acid, 
homobiflinctional imidoesters, including disuccinimidyl esters such as 3,3'- 
dithiobis(succinimidylpropionate), and bifunctional maleimides such as bis-N- 

10 maleimido-l,8-octane. Derivatizing agents such as methyl-3-[(p- 

azidophenyl)dithio]propiomidate yield photoactivatable intermediates that are 
capable of forming crosslinks in the presence of light. Alternatively, reactive water- 
insoluble matrices such as cyanogen bromide-activated carbohydrates and the 
reactive substrates are employed for protein immobilization. 

15 Glutaminyl and asparaginyl residues are frequently deamidated to the 

corresponding glutamyl and aspartyl residues, respectively. These residues are 
deamidated under neutral or basic conditions. The deamidated form of these 
residues falls within the scope of this invention. 

Other modifications include hydroxylation of proline and lysine, 

20 phosphorylation of hydroxy! groups of seryl or threonyl residues, methylation of the 
a-amino groups of lysine, arginine, and histidine side chains, acetylation of the N- 
terminal amine, amidation of any C-terminal carboxyl group, and glycosylation of 
any suitable residue. 

25 Da Antibodies 

The present invention also relates to polyclonal or monoclonal 
antibodies raised against ataxin-1 or ataxin-1 fragments (preferably fragments 
having 8-40 amino acids, more preferably 10-20 amino acids, that form the surface 
of the folded protein), or variants thereof, and to diagnostic methods based on the 

30 use of such antibodies, including but not limited to Western blotting and ELISA 
(enzyme-linked inununosorbant assay). 

Polyclonal antibodies to the SCAl polypeptide generally are raised in 
animals by multiple subcutaneous (sc) or intraperitoneal (ip) injections of ataxin-1, 
ataxin-1 fragments, or variants thereof, and an adjuvant. The polypeptide can be a 
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cloned gene product or a synthetic molecule. Preferably, it corresponds to a position 
in the protein sequence that is on the surface of the folded protein and is thus likely 
to be antigenic. It may be useful to conjugate the SCAl polypeptide (including 
fragments containing a specific amino acid sequence) to a protein that is 

5 immunogenic in the species to be immimized, e.g., keyhole limpet hemocyanin, 
serum albumin, bovine thyroglobulin, or soybean trypsin inhibitor using a 
bifunctional or derivatizing agent, for example, maleimidobenzoyi sulfosuccinimide 
ester (conjugation through cysteine residues), N-hydroxysuccinimide (through 
lysine residues), glutaraldehye, succinic anhydride, SOCI2, or R*N=C=NR, where R 

10 and r' are different alkyl groups. Conjugates also can be made in recombinant cell 
culture as protein fusions. Also, aggregating agents such as alum are used to 
enhance the immune response. 

The route and schedule of immunizing a host animal or removing and 
culturing antibody-producing cells are variable and are generally in keeping with 

15 established and conventional techniques for antibody stimulation and production. 
While mice are frequently employed as the host animal, it is contemplated that any 
mammalian subject including human subjects or antibody-producing cells obtained 
therefrom can be manipulated according to the processes of this invention to serve 
as the basis for production of mammalian, including human, hybrid cell lines. 

20 Preferably, rabbits are used to raise antibodies against ataxin-1 . 

Animals are typically immunized against the immunogenic 
conjugates or derivatives by combining about 10 |ag to about 1 mg of ataxin-1 with 
about 2-3 volumes of Freund's complete adjuvant and injecting the solution 
intradermally at multiple sites. About one month later the animals are boosted with 

25 about 1/5 to about 1/10 the original amount of conjugate in Freimd's complete 
adjuvant (or other suitable adjuvant) by subcutaneous injection at multiple sites. 
About 7 to 14 days later animals are bled and the serum is assayed for anti-ataxin-1 
polypeptide titer. 

Serum antibodies (IgG) are purified via protein purification protocols 
30 that are well known in the art. Antibody/antigen reactivity is analyzed using 
Western blotting, wherein suspected antigens are blotted to a nitrocellulose filter, 
exposed to potential antibodies and allowed to hybridize imder defined conditions. 
See Gershoni et al.. Anal. Biochem. ^ Ill, 1-15 (1983). The protein antigens can 
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then be sequenced using standard sequencing methods directly from the 
antibody/antigen complexes on the nitrocellulose support. 

Monoclonal antibodies are prepared by recovering immune cells - 
typically spleen cells or lymphocytes from lymph node tissue - from immunized 

5 animals (usually mice) and immortalizing the cells in conventional fashion, e.g., by 
fusion with myeloma cells. The hybridoma technique described originally by 
Kohler et al.. Fur. J, Immunol. . L 511 (1976) has been widely applied to produce 
hybrid cell lines that secrete high levels of monoclonal antibodies against many 
specific antigens. It is possible to fuse cells of one species with another. However, 

10 it is preferable that the source of the immunized antibody-producing cells and the 
myeloma be from the same species. While mouse monoclonal antibodies are 
routinely used, the present invention is not so limited. In fact, although mouse 
monoclonal antibodies are typically used, human antibodies may be used and may 
prove to be preferable. Such antibodies can be obtained by using human 

15 hybridomas. Cote et al.; Monoclonal Antibodies and Cancer Therapy; A.R. Liss, 

Ed.; p. 77 (1985). 

The secreted antibody is recovered from tissue culture supernatant by 
conventional methods such as precipitation, ion exchange chromatography, affinity 
chromatography, or the like. The antibodies described herein are also recovered 

20 from hybridoma cell cultures by conventional methods for purification of IgG or 
IgM, as the case may be, that heretofore have been used to purify these 
immunoglobulins from pooled plasma, e.g.. ethanol or polyethylene glycol 
precipitation procedures. The purified antibodies are sterile filtered, and optionally 
are conjugated to a detectable marker such as an enzyme or spin label for use in 

25 diagnostic assays of the ataxin-1 in test samples. 

Techniques for creating recombinant DNA versions of the antigen- 
binding regions of antibody molecules (known as Fab fragments), which bypass the 
generation of monoclonal antibodies, are encompassed within the practice of this 
invention. Antibody-specific messenger RNA molecules are extracted from 

30 immune system cells taken from an immimized animal, transcribed into 
complementary DNA (cDNA), and the cDNA is cloned into a bacterial expression 
system. 

The anti-ataxin-1 antibody preparations of the present invention are 
specific to ataxin-1 and do not react immunochemically with other substances in a 
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manner that would interfere with a given use. For example, they can be used to 
screen for the presence of ataxin-1 in tissue extracts to determine tissue-specific 
expression levels of ataxin-1 . 

The present invention also encompasses an immunochemical assay 
5 that involves subjecting antibodies directed against ataxin-1 to reaction with the 
ataxin-1 present in a sample to thus form an (ataxin-1 /anti-ataxin-1) immune 
complex, the formation and amount of which are measures - qualitative and 
quantitative, respectively - of the ataxin-1 presence in the sample. The addition of 
other reagents capable of biospecifically reacting vnth constituents of the 

10 protein/antibody complex, such as anti-antibodies provided with analytically 
:r detectable groups, facilitates detection and quantification of ataxin-1 in biological 
samples, and is especially useful for quantitating the level of ataxin-1 in biological 
samples. Ataxin-1 /anti-ataxin-1 complexes can also be subjected to amino acid 
sequencing using methods well knovm in the art to determine the length of a 

15 polyglutamine region and thereby provide information about likelihood of affliction 
with spinocerebellar ataxia and likely age of onset. Competitive inhibition and non- 
competitive methods, precipitation methods, heterogeneous and homogeneous 
methods, various methods named according to the analytically detectable group 
employed, immunoelectrophoresis, particle agglutination, immunodiffusion and 

20 immunohistochemical methods employing labeled antibodies may all be used in 
connection with the immune assay described above. 

The invention has been described with reference to various specific 
and preferred embodiments and will be further described by reference to the 
25 following detailed examples. It is understood, however, that there are many 
extensions, variations, and modifications on the basic theme of the present invention 
beyond that shown in the examples and detailed description, which are within the 
spirit and scope of the present invention. 
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Experimental Section 
L The Gene for SCAl Maps Centromeric to D6S89 

To confirm the position of SCAl with respect to D6S89 and to identify 
closer flanking markers, two dinucleotide repeat polymorphisms D6S109 and 

5 D6S202 were used. Using YAC clones isolated in the D6S89 region, three 
additional dinucleotide repeat polymorphisms were identified, one of which 
(AMIOGA) showed no recombination with SCAl and confirmed that D6S89 is 
telomeric to SCAL The dinucleotide repeat at D6S109 revealed six recombination 
events with SCAl and determined D6S109 to be the other flanking marker at the 

10 centromeric end. Linkage analysis, physical mapping data as discussed below, and 
analysis of recombination events demonstrated that the order of markers is as 
follows: Centomere - D6S109 - AMIOGA/SCAI - D6S89 - SBl - LR40 - D6S202 - 
Telomere. 

15 A- Materials and Methods 

L Kindreds 

Nine large SCAl families were used in the present study. Clinical 
findings and linkage data demonstrating that these families segregated SCAl have 
been previously reported. See, J.F. Jackson et al., N. Engl. J. Med. . 296 , 1138-1141 
20 (1977); B.J.B. Keats et al.. Am. J. Hum. Genet. . 42, 972-977 (1991); L.P.W. Ranum 
et al., Am. J. Hum. Genet . 42, 31-41 (1991); and H.Y. Zoghbi et al.. Am. J. Hum. 
Genet. . 42, 23-30 (1991). Analysis of polymorphisms at the loci D6S109, 
AMIOGA, SBl, LR40, and D6S202 was performed on individuals from these 
kindreds. 

25 The Houston (TX-SCAl) kindred included 106 individuals, of whom 

57 (25 affected) were genotyped. See, H.Y. Zoghbi et al., Ann. Neurol. . 21, 580- 
584 (1988). Patients symptomatic at the time of exam, as well as asymptomatic 
individuals who had both a symptomatic child and a symptomatic parent, were 
classified as "affected." In this kindred, a deceased individual previously assigned 

30 as affected (from family history data) was reassigned an unknown status after 
review of medical records. This reassignment eliminated what was previously 
thought to be a recombination event between SCAl and D6S89 in the TX-SCAl 
kindred. To maximize the amount of information available for linkage analysis, the 
two chromosomes 6 in somatic cell hybrids for 15 affected individuals and one 
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unaffected individual from the TX-SCAl kindred were separated. See, H.Y. Zoghbi 
et al.. Am. J. Hum. Genet. . 44> 255-263 (1989). The Louisiana (LA-SCAl) kindred 
included 50 individuals of whom 26 (8 affected) were genotyped. See, B.J.B. Keats 
et al.. Am. J. Hum. Genet. . 49, 972-977 (1991). The Minnesota (MN-SCAl) 
5 kindred included 175 individuals, of whom 106 (17 affected) were genotyped. See, 

- J.L. Haines et al.. Neurology , M, 1542-1548 (1984); and L.P.W. Ranum et al.. Am. 
J. Hum. Genet. . 49, 31-41 (1991). The Michigan (MI-SCAl) kindred included 201 
individuals, of whom 127 (25 affected) were genotyped. See, H.E. Nino et al., 
Neurology . 2Q, 12-20 (1980). The Mississippi (MS-SCAl) kindred included 84 

10 individuals, of whom 37 (17 affected) were genotyped. See, J.F. Jackson et al., ]iL 

V En g l J. Med. , 296, 1138^1141 (1977). 

Four Italian families segregating SCAl were analyzed; their clinical 
phenotype and HLA linkage data were reported previously. See, M. Spadaro et al.. 
Acta Neur ol. Scand. , S5, 257-265 (1992). Three families originated in the Calabria 

15 Region (Southern Italy): family IT-P with 135 members of whom 80 (21 affected) 
were genotyped; for computational reasons, the family was subdivided into 3 
different pedigrees (RM, VI, and FB) and only one of the 3 consanguinity loops was 
considered; family IT-NS, withr 43 members of whom 27 (7 affected) were typed; 
family IT-NS with 51 members of whom 16 (3 affected) were typed. The fourth 

20 family, IT-MR, originated from Latium and consisted of 17 individuals of whom 10 
(4 affected) were genotyped. 



2. CEPHFgmilies 

The 40 CEPH reference families were genotyped at the D9S109, LR40 
25 and D6S202 loci in order to provide a large number of informative meioses for 
marker-marker linkage analyses. Markers AMIOGA and SBl flank D6S89, having 
been isolated from a yeast artificial chromosome (YAC) contig built bidirectionally 
from D6S89 (see below). A subset of 18 CEPH families which defined 26 
recombinants between D6S109 and D6S89 was genotyped at AMIOGA and SBl in 
30 order to determine the order of AM 1 OGA, D6S89 and SB 1 with respect to D6S 1 09. 

3. Cloning of Sequences Containing Dinucleotide Repeats 

The identification and description of polymorphic dinucleotide repeats 
at the D6S109 and D6S202 loci have been previously reported. See, L.P.W. Ranum 
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et al„ Nucleic Acids Res. > JL9, 1171 (1991); and F. LeBorgne-Demarquoy et ah. 
Nucleic Acids Res. . 19, 6060 (1991). 

DNA fragments containing dinucleotide repeats were cloned at LR40 
and SBl from yeast artificial chromosome (YAC) clones at the LR40 and FLBl 
5 loci, respectively (see below). DNA from each YAC clone was amplified in a 50 |il 
reaction containing 20 ng DNA, a single Alu primer (see below), 50 mM KCl, 10 
mM Tris-CI pH 8.3, 1.25 mM MgCl2, 200 or 250 nM dNTPs, 0.01% (w/v) gelatin, 
and 1.25 units Thermus aquaticus DNA polymerase (Taq polymerase— Perkin 
Elmer, Norwalk, CT). For amplification of FLBl YAC DNA, a primer 

10 complementary to the 5' end of the Alu consensus sequence (Oncor Laboratories, 
Gaithersberg, MD), designated SALl, was used = 5'- 
AGGAGTGAGCCACCGCACCCAGCC-3' at a final concentrafion of 0.6 ^M. For 
amplification of LR40 YAC DNA, 0.2 [iM primer PDJ34 was used. See, C. 
Breukel et al.. Nucleic Acids Res. . 18, 3097 (1990). Samples were overlaid with 

15 mineral oil, denatured at 94°C for 5 minutes, then subjected to 30 cycles of 1 minute 
94°C denaturation, 1 minute 55°C annealing, and 5 minutes 72^C extension. The 
last extension step was lengthened to 10 minutes. Electrophoresis of 15 |il of PGR 
products was performed on a 1.5% agarose gel, which was Southern blotted and 
hybridized with a probe prepared by random-hexamer-primed labelling of synthetic 

20 poIy(dG-dT)-poIy(dA-dC) (Pharmacia, Piscataway, NJ) using [a-^^P]dCTP, as 
described by A.P. Feinberg et al.. Anal. Biochem. . 137 . 266-267 (1984). Fragments 
hybridizing with the dinucleotide repeat probe were identified and were 
subsequently purified by electrophoresis on a low-melt agarose gel. Fragments were 
excised and reamplified by PCR as above. 

25 For LR40, reamplified DNA was repurified by low-melt gel 

electrophoresis, and DNA extracted from excised bands by passage through a 
glasswool spin column as described by D.M. Heery et ah. Trends Genet 6, 1 73 
(1990). A purified 1.2-kb fragment was cloned into pBluescript plasmid modified 
asf a "T- vector" as described by D. Marchuck et al.. Nucleic Acids Res, , X2, 1154 

30 (1990). From this clone, a 0.6-kb Hindi restriction fragment containing a GT repeat 
was subcloned into pBluescript plasmid, and sequenced on an Applied Biosystems, 
Inc. (Foster City, CA) automated sequencer. 
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For SBl, a reamplified 1-kb fragment was ethanol precipitated and 
blunt-end cloned into pBluescript plasmid. Plasmid DNA was isolated and PCR 
amplified in one reaction with Ml 3 Reverse primer plus BamGT primer (5'- 
CCCGGATCCTGTGTGTGTGTGTGTGTG-3') and in a second reaction Ml 3 
5 Universal primer and BamCA primer (5'- 

CCCGGATCCACACACACACACACACAC-3'). See, C.A. Feener et al.. Am. J. 
Hum. Genet., 4S, 621-627 (1991). PCR conditions were as above except primers 
were used at 1 jaM concentration; 2.5 units Taq polymerase and approximately 30 
ng DNA were used per reaction, with final reaction volumes of 100 )il, and an 

10 annealing: temperature of 50°C. Products were precipitated, resuspended, and 
digested with BamUl (product of Universal primer reaction) or BamHl and Hindi 
(product of Reverse primer reaction). These two fragments were cloned into 
pBluescript plasmid and sequenced as above. 

Dinucleotide repeats were cloned at AM 10 from a YAC containing 

15 this locus. A X,FixII library was constructed using DNA from this yeast clone, and 
human clones were identified by filter hybridization using human placental DNA as 
a probe. A gridded array of these human clones was grown, and filters containing 
DNA from these clones were hybridized with a ^^P-labelled poly(dG-dT)-poly(dA- 
dC3) probe as described above. DNA was prepared from positive clones, digested 

20 with various restriction enzymes, and analyzed by agarose gel electrophoresis. 
Southern blotting and hybridization were carried out with the poly(dG-dT)-poly(dA- 
dC) probe. A 1-kb fragment hybridizing with the dinucleotide repeat probe was 
identified, clones into Ml 3, and sequenced. 

25 4. PGR An^ly^is 

Primer sequences and concentrations, and PCR cycle times used for 
amplification of dinucleotide repeat sequences from human genomic DNA are 
presented in Table 1 . For the LR40 polymorphism, primer set "A" was used for 
analysis of the TX-SCAl, LA-SCAl, and MS-SCAl kindreds, while primer set "B" 

30 was used for all other kindreds. Buffer compositions were as follows: 50 mM KCl, 
10 mM Tris-Cl pH 8.3, 1.25 mM MgClj (1.5 mM MgCl2 for AMIOGA), 250 jiM 
dNTPs (200 nM dNTPs for AMIOGA), 0.01% (w/v) gelatin, and 0.5 - 0.625 unit 
Taq polymerase. For the LR40 analysis, 2% formamide was included in the PCR 
buffer. When primer set B was used for LR40 analysis, 125 ixM dNTPs, 1.5 mM 



wo 95/01437 



PCT/US94/07336 



-38- 

MgClj, and 1 unit Taq polymerase were used. All reaction volumes were 25 y,l and 
contained 40 ng genomic DNA. Four microliters of each reaction was mixed with 2 
|il formamide loading buffer, denatured at 90-100'*C for 3 minutes, cooled on ice, 
and 2-4 |il was used for electrophoresis on a 4% or 6% polyacrylamide/7.65 M urea 
5 sequencing gel for 2-3 hours at 1 100 V. PGR assay conditions have been reported 
previously for D6S202 and D6S109. See, L.P.W. Ranum et al.. Nucleic Acid^ R? <; 
19, 1171 (1991); and F. LeBorgne-Demarquoy et al.. Nucleic Acid^ Rp<= 6060 
(1991), 



wo 95/01437 



PCT/US94/07336 



-39- 



Table 1. 

Primers and PCR conditions for amplification of 
dinucleotide repeat sequences 



PCR 



M^rk^r/Typg Primers ' Steps Cycles 

AM10GA/(GA)„ AAGTCAGCCTCTACTCTTTGT 94°C for 30 sec. 
TGA 

CTTGGAGCAGTCTGTAGGGAG 55°C for 30 sec. 30 

72°C for 30 sec. 



SB1/(GT) 



TGAAGTGATGTGCTCTGTTC 
AAAGGGGTAGAGGAAATGAG 



94°C for 60 sec. 
60°C for 60 sec. 
72°Cfor60sec. 



30 



LR40/(GT)„ AGGAGAGGGGTCATGAGTTG 94°C for 60 sec. 

set A GGCTCATGAATACATTACATG 

AAG 58°C for 60 sec. 

72°C for 60 sec. 



25 



LR40/(GT)„ CTCATTCACCTTAGAGACAAA 

TGGATAG 94°C for 60 sec. 

set B ATGGTATAGGGATTTTNCCAA 

ACCTG 60°C for 60 sec. 

72°C for 45 sec. 



27 



Primers are shown as 5' to 3' sequence. The first primer of each pair was 
end-labelled with y-"*^? ATP and polynucleotide kinase. Primer concentrations 
were 1 mM. 



5 
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5. SCAl Linkage Analysis 

The D6S109, AMIOGA, D6S89, SBl, LR40 and D6S202 markers 
were analyzed for linkage to SCAl using the computer program LINKAGE version 
5.1 which includes the MLINK, ILINK, LINKMAP, CLODSCORE and CMAP 
5 programs. See, G.M. Lathrop et al., Proc. Natl. Acad. Sci. USA . M, 3443-3446 
(1984). Age dependent penetrance classes were assigned independently for each of 
the families included in the analysis. Marker alleles were receded to reduce the 
number of alleles segregating in a family to four, five or six alleles to simplify the 
analysis. The allele frequencies for the various markers were based on the 

10 frequencies of the alleles among the spouses in each family and were determined 
separately for the two American black kindreds, for the Italian kindreds, and for the 
Caucasian kindreds from Minnesota, Michigan, and Mississippi, with the following 
exception - the allele frequencies for D6S109 in the MI and MN kindreds were 
based on the fi-equencies of the alleles in the CEPH famiUes. 

15 Maximum LOD scores for the various markers were calculated with 

the MLINK program by running each of the analyses separately for the various 
families, at theta values with increments of 0-0005 to 0.001, and then adding the 
values of each of the kindreds. The analyses were done separately to ensure that the 
allele frequencies for the various markers were representative for each of the 

20 ethnically diverse families. As a control, the recombination fractions at the 
maximum lod scores (Zmax) between each marker and SCAl were calculated using 
the ILINK program after the allele frequencies for each marker were set equal to one 
another. In all cases the recombination frequencies were the same and values 
were very similar to those reported in Table 5 below. 

25 

6. CEPH Linkage Analysis 

Forty CEPH families were typed for the GT repeat markers D6S109, 
D6S202 and LR40. The original alleles were recoded to five alleles. The SBl and 
AM 10 markers were typed in a subset of the CEPH panel which defined 26 
30 recombinants from 18 different families between D6S109 and D6S89. The 
CLODSCORE program was used for the two-point analyses and CMAP was used 
for the three-and four-point analyses. For the three-point and four-point analyses, 
the interval between the mapped markers was fixed based on the two point = 0f 
results. The likelihood of the location of the test locus (SCAl) was calculated at 10 
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different positions within each interval. The test for sex difference in the 0 values 
was performed using a x statistic, with x = 2(lnlO)[Z(e^,ef) - Z(e = e^, = 9^)], 
where Z(9^, 9f) is the overall Z^ax foi* arbitrary 9^ and 9f, while Z(9 = 9^ = 9f) is the 
Zmax constrained to 9^ = 9f. Under homogeneity (HI), % approximates a % with 1 
5 d.f Rejection of homogeneity occurs when > 3.84. 

B. Results 

1 ■ Dinucleotide Repeat Cloning and Sequencing and Analysis 

Dinucleotide repeats SBl and LR40 were amplified directly from 
> 10 YAC clones by y4/w-primed PGR and the dinucleotide repeat containing fragments 
: were identified by hybridization. The PGR products were cloned either directly or 
by further amplification using tailed poly(GT) or poIy(GA) primers paired with an 
Alu primer. In addition, two dinucleotide repeats were subcloned from a lambda 
phage clone from a library constructed from a YAC at the AMID locus. 
15 Dinucleotide repeats from the SBl, LR40, and AM 10 loci were 

sequenced. At LR40, the cloned repeat sequence was (GA)i6TA(CA)io. TheAMlO 
fragment contained two repeat sequences separated by 45 bp of nonrepeat sequence. 
The first repeat, designated AMIOGA, was (GA)2ATGACA(GA)n. The second 
repeat, designated AMI OGT, was not used in this study because upon analysis of the 
20 TX-SCAl kindred it yielded the same information as the AMIOGA repeat. The 
AMI OGT repeat consists of (GA)2AA(GA)6GTGA(GT),6AT(GT)5. Primer 
information for AMI OGT is available through the Genome Data Base. At SBl, the 
repeat tract was not sequenced; only flanking sequence was determined. 

As there are differences in allele distributions of markers among the 
25 different races, allele frequencies are reported here separately for the CEPH kindreds 
(Caucasian) and the TX-SCAl kmdred (American black) (Table 2). CEPH allele 
frequencies were based on 72 independent chromosomes for SBL 82 independent 
chromosomes for AM 10, and on the full set of 40 families for D6S109 and LR40. 
TX-SCAl allele firequencies were based on 45 independent chromosomes for LR40, 
30 43 independent chromosomes for SBl, 45 independent chromosomes for AM 10, 
and 42 independent chromosomes for D6S109. 



wo 95/01437 



PCTAJS94/07336 



-42- 



o 



o 



O 

NO 

Q 



S2 

€3 



r 



© ^ 

C 

s 

cr 



< 

X 



< 
X 



Oh 



u 

CO 



"3 



04 

u 

o 

o 

CO 



CO 



< 
U 

CO 



-2 



oocJoooooocJ 



OOOOCNOOOOOOOO OO 

oooooooooooo 



o o o 



'rf • m ♦-^ 
O ^ ^ 
O O O O O 



<N ON rvi m tri 

^ oo CM m ^ 

o o o o o 

o o o o o o 



m 

VO rn 

o ^ m 
o o o 



r-- ^ ro r*^ 

a\ »n oo ^ 

O — ; O O 

o o o o o 



o 
o 



oocJoooooo 



(N O <N OO 

<N (N CO oo 

o o rvi o CN 

o o o o o o 



o 
o 



ON 

04 OO 

o 

o o 



o o 



ON 
OO 

o 



OO^OOCNjOOi-; 

ooocSooooo 



OO OO OO 

^ 

o o o o 
o o o o 



en 
JO 

CO "C 



I 



13 

^> 

CO 

i 

CO 

o 

1 

s 

CO 

I 

O 



< 

o 

CO 
h4 



c2 



< It 



-2 <=5 cL 

^ CN S- 

II li-i 

op JT 
*co 

ctt CO <N 

. O 

eg CO 



CO 

^ tg 

CO CS 



-2 <f 



CO 



e2 



o PQ 

CO 



O 



JO 

r- 



o — 

< < 



w " n 

r4f*\^»nsor^ooo\— • — — — — 
<<<<<<<<<<<<< 



3 
fi 

{a 

CO On <I 

•2 2 a: 

= 55 a: 

^ ^ w 



c 
a> 
Q 

I 

c 

OO 

o 

CQ 

<u 



O 
CN 

VO 

Q 
c 



OS 
On 



o 



3 



E 

< 

E 

C3 



Q-i ON 

TD CO 

3 O 

a$ o 
3 

cu r 

W 13 



wo 95/01437 



PCT/US94/07336 



.43- 

7 Genetic Linkage Data 

a CRPH families. In order to establish a well-defined genetic map for 
the SCAl region, newly isolated DNA markers were mapped using the CEPH 
reference families. Results of pairwise linkage analyses in CEPH kindreds are 
5 shown in Table 3. No recombination was observed between AMIOGA and D6S89 
(9 = 0.00, Zmax = 15.1) using a subset of the CEPH panel which defined 26 
recombinants between D6S109 and D6S89. The markers D6S109 and LR40 are 
close to D6S89, with recombination fi-actions of 0.067 (Z^ax = 71.4) and 0.04 (Z^ax 
= 84.5) respectively. 

10 Selected multipoint analyses were performed to position the newly 

isolated markers D6S109, LR40, D6S202 with respect to markers previously 
mapped using the CEPH panel. The CMAP program was used for three- and four- 
point linkage analyses to position D6S109 relative to D6S88 and D6S89 and to 
position LR40 and D6S202 relative to each other and to D6S89 and F13A. For the 

15 three-point analyses, the D6S88 - D6S89 interval was fixed based on the two-point 
recombination fi-action in CEPH and the lod score was calculated at various 
recombination fi-actions. The order D6S88 - D6S109 - D6S89 is favored over the 
next most likely order by odds of 4 x 10^ : 1 (Table 4). For the four-point analyses, 
both the D6S89 - D6S202 - F13A and the D6S89 - LR40 - F13A intervals were 

20 fixed based on the two-point recombination fractions; lod scores were then 
calculated for LR40 and D6S202 at various 9 values on the respective fixed maps. 
The order D6S89 - LR40 - D6S202 - F13A is favored over the next most likely 
order in both analyses; odds in favor were 400 : 1 when the position of LR40 was 
varied and were 1 x 10^ to 1 when D6S202 was varied (Table 4). 

25 The order of AMIOGA and D6S89 could not be determined using the 

D6S109/D6S89 CEPH recombinants. However, the order AMIOGA - D6S89 - SBl 
was deduced by characterization of overlapping yeast artificial chromosome clones 
containing these markers (see below). Furthermore, one end of this contig is present 
in a well characterized radiation-reduced hybrid known to contain D6S 1 09 and other 

30 centromeric markers, indicating the order D6S109 - AMIOGA - D6S89 - SBl. 
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Table3. 

Painvise linkage results in CEPH 



Marker Pair 


e.=0f 


7 


e„, 


Or 


7 

•"•max 




HLA and D6S88 


0.128 


26.4 


0.103 


0.168 


26.8 


1.86 


D6S109 


0.126 


48.4 


0.062 


0.176 


51.0 


12.1* 


AMIO 


0.608 


0.0440 


0.301 


0.500 


0.246 


0.929 


D6S89 


0.158 


43.3 


0.091 


0.225 


46.6 


15.2* 


SBl 


0.574 


0.0190 


0.299 


O.50O 


0.400 


0.381 


LR40 


0.213 


25.5 


0.116 


0.306 


30.0 


20.8* 


HZ30 


0.251 


21.6 


0.191 


0.318 


23.6 


8.95* 


F13A 


0.291 


8.81 


0.255 


0.326 


9.14 


1.52 


D65588 and D6S109 


0.017 


48.6 


0.024 


0.009 


48.8 


0.846 


AMIO 


0.654 


0.0290 


0.499 


0.696 


0.047 


0.0820 




0.086 


36.1 


0.076 


0.098 


36.2 


0.0750 


SRI 


0.203 


1.09 ■ 


0.136 


0.687 


1.36 


1.27 


LR40 


0.088 


31.1 


0.078 


0.104 


31.2 


0.350 


HZ30 


0.135 


30.4 


0.124 


0.152 


30.4 


0.340 


F13A 


0.180 


10.2 


0.158 


0.217 


10.3 


0.626 




0 730 


0.933 


0.170 


0.502 


1.67 


3.39 




0 067 


71.4 


0.035 


0.090 


72.5 


5.15* 


cm 


U. /HZ 




V/. i i .7 




4 


I V/.7 


LR40 


0.109 


50.6 


0.050 


0.152 


52.9 


10.5* 


HZ30 


0.162 


36.6 


0.147 


0.174 


36.7 


0.515 


F13A 


0.207 


14.4 


0.211 


0.204 


14.4 


0.0368 


AMIO and D6S89 


0.000 


15.1 


0.000 


0.000 


15.1 


0.000 


SBl 


0.000 


13.2 


0.000 


0.000 


13.2 


0.000 


LR40 


0.021 


8.74 


0.000 


0.050 


9.11 


1.74 


HZ30 


0.000 


13.8 


0.000 


0.000 


13.8 


0.000 


F13A 


0.135 


3.48 


0.042 


0.253 


4.39 


4.16* 
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D6S89andSBl 


0.000 


25.0 


0.000 


0.000 


25.0 


0.000 




KJ.UHK) 


OH. 5 


U.U3U 


A f\AC% 

U.U4y 


OA 1 

64.7 


0.925 


HZ30 


0.078 


76.0 


0.075 


0.077 


76.0 


0.0230 


F13A 


0.151 


30.7 


0.139 


0.160 


30.7 


0.248 


SB 1 and LR40 


0.033 


14.4 


0.022 


0.044 


14.5 


0.350 


HZ30 


0.026 


17.5 


0.032 


0.020 


17.5 


0.0300 


F13A 


0.136 


4.80 


0.119 


0.155 


4.84 


0.170 


LR40 and HZ30 


0.079 


64.8 


0.092 


0.050 


65.0 


1.09 


F13A 


0.131 


29.1 


0.121 


0.140 


29.2 


0.189 


HZ30andF13A 


0.109 


38.4 


0.122 


0.106 


38.4 


0.0092 



♦Indicates statistically significant differences were observed in the recombination 
fractions when the assumption of homogeneity (0m=6f) was rejected; that is the 
likelihood that x > 3.84 with 1 degree of freedom should occur by chance in P < 
0.05. 
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b. SCAl kindreds. Results of pairwise linkage analyses in SCAl 
kindreds are shown in Table 5. AMIOGA, D6S89, and SBl are all closely linked to 
SCAl. No recombination was observed between AMIOGA and SCAl; the lod 
score is 42.1 at a recombination fraction of 0.00. The recombination fraction 

5 between D6S89 and SCAl is 0.004 (lod score of 67,6). The recombination fraction 
between SBl and SCAl is 0.007 (lod score of 39.5). D6S109, LR40 and D6S202 
are linked to SCAl as well, but at greater distances (recombination fractions of 0.04, 
0.03, and 0.08 respectively). Based on genetic mapping in nine large kindreds, the 
SCAl locus is very close to D6S89 and AMIOGA, with a Z^^^A support interval 

10 less than or equal to 0.02 in both cases. 
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3. Analysis of Key Recombinants 

One recombination event between D6S89 and SCAl has been 
confiraied in an affected individual. The patient, individual MI-2 in Figure 4, was 
also recombinant at SBl, although uninformative at LR40 and D6S202. He carried 
5 a disease haplotype at the HLA, D6S109 and AM 10 loci, demonstrating that SCAl 
is centromeric to D6S89, as indicated by the rightmost arrow in Figure 4. To 
eliminate the possibility of sample mix-up, the patient's DNA was reextracted from 
a hair sample and retyped for D6S109, D6S89, D6S202, LR40, AMIOGA, and SB 1. 
The results from the hair sample matched those from the cell line originally 

10 established from the patient's blood. The patient's medical records were carefully 
reexamined and it was confirmed that he did indeed have ataxia. In addition, his 
haplotypes were consistent with those of a sister and a daughter. 

D6S109 lies centromeric to D6S89; six recombination events have 
been observed between D6S109 and SCAl, as shown in Figure 4. At this point, 

15 D6S109 is the centromeric marker closest to SCAl. The arrows in Figure 4 denote 
the maximum region conmion to all affected chromosomes, and therefore the 
maximum possible region containing the SCAl gene, which extends from D6S89 to 
D6S109. 

No additional marker-SCAl recombination events have been observed 
20 between D6S89 and SBl. Markers ftirther telomeric to SBl show additional 
recombination v^th SCAl - one recombination event between SCAl and LR40 and 
three recombination events between SCAl and D6S202, These events are depicted 
in Figure 4 (all recombination events depicted in Figure 4 are in affected 
individuals). 

25 

IL Mapping and Cloning the Critical Regi on for the SPA 1 n^wi^ 

A 2.5-Mb yeast artificial chromosome (YAC) contig was developed 
vdth the ultimate goal of defining and cloning the region likely to contain the SCAl 
gene (SCAl critical region). 

30 
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A> Materials and Methods 

1. Celllines 

1-7 is a human-hamster hybrid cell line which contains the short arm of 
chromosome 6 as its only human chromosome. See, H.Y. Zoghbi et al., Genomics . 
5 6, 352-357 (1990). R86, R78, R72, R54 and R17 are radiation reduced hybrid cell 
lines retaining various portions of 6p22-p23. See, H.Y. Zoghbi et al.. Genomics . 2, 
713-720 (1991). R54 retains markers known to be telomeric to D6589, such as 
D6S202 andF13A. 

10 2. Generation of new DNA markers and Sequence-Tagged Sites fSTSs^ 

DNA from a radiation reduced hybrid retaining D6S89 (R86) and 
DNAs from four radiation hybrids (R78, R72, R54 and R17) which do not retain 
D6S89 but retain markers immediately flanking D6S89 were used in comparative 
AIu-PCR to isolate region-specific DNA markers. See, D.L. Nelson et al., Proc. 

15 N^tl Ag^d, Sci. USA, 6686-6690 (1989); and H.Y. Zoghbi et al.. Genomics . £ 
713-720 (1991). In addition, R78 was useful in eliminating markers derived from 
the centromeric region of 6p. H.Y. Zoghbi et aL, Genomics . 2, 713-720 (1991). Alu- 
PCR was carried out using AIu primers 559 and 517 individually (D.L. Nelson et aL, 
Proc. Natl. Acad. Sci. USA , £6, 6686-6690 (1989)) as well as PDJ 34 (C. Breukel et 

20 al.. Nucleic Acids Res. . IS, 3097 (1990)). Ah4-PCR fragments found to be present 
in R86 but absent in R78, R72, R54 and R17 were identified and were cloned into 
£coRV-digested pBluescript IIKSh- plasmid (Stratagene, La Jolla, CA) which was 
modified using the T-vector protocol. See, D. Marchuk et al.. Nucleic Acids Res. . 
12, 1 154 (1990). Cloned fragments were sequenced on an Applied Biosystems, Inc. 

25 (Foster City, CA) automated sequencer to establish STSs. 

3. Isolation and Characterization of YAC clones 

The Washington University YAC library (B.H. Brownstein et al.. 
Science . 2M, 1348-1351 (1989)), and the CEPH YAC library (H.M. Albertsen, et 
30 al., Proc. Natl. Acad. Sci. USA . SZ, 4256-4260 (1990)), were screened using a PCR- 
based method. See, E.D. Green et al., Proc. Natl. Acad. Sci. USA . ^7, 1213-1217 
(1990); and T.J. Kwiatkowski et al.. Nucleic Acids Res. . J8, 7191-7192 (1990). 
PCR amplifications were carried out in 25-50 ml final volume vrith 50 mM KCl, 10 
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mM Tris-HCI pH 8.3, 1.25 mM MgClj, 0.01% (w/v) gelatin, 250 jaM of each dNTP; 
1.25 units of Amplitaq polymerase (Perkin-Elmer, Norwalk, CT) and 1 |iM of each 
primer. PGR cycle conditions are specified in Table 6. 
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Table 6. 
STSs and YACs in 6p22-p23 



Probe 


Primer set 


YACs" 


Annealing 
temp.** 


D6S89 


cttgttcatctgccttgtgcaccta 
agcgactgcctaaac 


B126G2,B134D5, 
B172B3, B214D3, 
CSC 12, 191D8, 299B3, 
379C2, 468D12, 124G2, 
511H11 


55°C 


AMIO 
(D6S335) 


ttaaggaagtgttcacatcaggg 
aattgtgcttatgtcactggg 


A23C3. A183C6, 
A250D5, B238F12, 
A91D2 


55°C 


A250D5-L 
(D6S337) 


aattctggagagaggatgttggt 
tctttttttggtag 


195B5, 242C5, 475A6, 
30F12 


44°C 


64U 


catcgtgttgtgtggtgaagctc 
agacgctaaactcaagg 


492H3, 172B5,227B1, 
261 H7 


50°C 


D6S288 


atgatccgtggtagtggcagga 
cctgttactgacgcc 


60H7,351B10 


55°C 


D6S274 


ctcatctgttgaatggggatctta 
aatgctatgccttccg 


486F9, 149H3, 42A5, 
283B2, 320E12 


55°C 


FLBl 
(D6S339) 


tgcaaatccctcagttcacttgctt 
gactttgccatgttc 


140H2, 270D3, 274D12, 
401D6, 57G3, 168F1 


50°C 


AM12 
(D6S336) 


atacccatacggatttgagggca 
acactatcaggctaagaatg 


A71B3,228A1, 193B3, 

90A12,539C11,53G12, 

35E8 


55°C 


53G12-L 


caaataccagcaactcaccagc 
ggttccttcagcatcctacattc 


3G6, 82G12, 98G5, 
135F6, 198C8, 330G1 


58°C 



^ YACs in this study are from the CEPH and Washington University 
numbers identify the library source (Washington University I.D. 
preceded by a letter). Several YACs were identified with more than 
such information, please refer to Table 2. 



libraries. LD. 
numbers are 
one STS; for 



^ PCR conditions were 94°C for 4 minutes followed by 35-40 cycles of 94°C 
10 denaturation for 1 minute, annealing at the specified temperature for 1 minute, and 
72*^C extension for 2 minutes. A final extension step of 7 minutes at 72^C was used. 
PCR buffer and primer concentrations are as described in the text; for the 53G12-L 
STS a final concentration of 2% formamide was used in the PCR reaction. 
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Yeast DNA-agarose blocks were prepared as described by D.C. 
Schwartz et al., CelL 21, 67-75 (1984); and G.J.B. van Ommen et al. in Hunian 
Genetic Diseases-A Practical Approach : K.E. Davies, ed.; pp. 113-117; IRL Press, 
Oxford (1986). All the YAC clones were analyzed by pulsed-field gel 

5 electrophoresis (PFGE) to determine the insert size and to confirm that a single 
YAC was present in a specific colony. YAC inserts were sized by electrophoresing 
yeast DNA through a 1% Fastlane agarose (FMC, Rockland, ME) gel in 0.5x TAE 
(20 mM Tris-acetate/0.5 mM EDTA). For rapid detection of possible overlaps 
between YAC clones isolated at different STSs, the labelled AIu-?CR products of 

10 new YACs were hybridized to filters containing Alu-PCR products of individual 
YACs in the region. Most of the YAC clones were tested for chimerism using the 
Alu'PCR dot blot method described by S. Banfi et al., NucHeic Acids Res., 2Q, 1814 
(1992). The Alu-PCR products from YAC clones were hybridized to a dot-blot 
containing the Alu-PCR products from monochromosomal or highly reduced 

15 hybrids representing each of the 24 different human chromosomes as previously 
described by S. Banfi et al.. Nucleic Acids Res. , 20, 1814 (1992). In addition a dot- 
blot containing Alu-PCR products fi-om radiation reduced hybrids representing 
different segments of 6p was used to insure that a YAC does not contain two non- 
contiguous segments ft^om 6p. Ends of YAC clones were isolated either by inverse- 

20 PCR as previously described by G. Joslyn et al., CelL 66, 601-613 (1991) or by Alu- 
vector PCR as described by D.L. Nelson et al., Proc. Natl. Acad. Sci. USA , M, 
6157-6161 (1991). ^/w-vector PCR was carried out using ^/w-primers PDJ34 and 
SALl, as described by C. Breukel et al.. Nucleic Acids Res.. IS, 3097 (1990); and 
the pYAC4 vector primers described by M.C. Wapenaar et al.. Hum. Mol. Genet.. 2, 

25 947-995 (1993) and analogous vectors described by G.P, Bates et al.. Nature 
Genetics , i, 180-187 (1992). All YAC ends were regionally mapped by 
hybridization to Southern blots containing ScoRI-digested DNAs fi-om the YAC 
clones and firom the hybrid cell lines: 1-7, R86, and R72. 

30 4. Cosmid library preparation fro m YACs 

Cosmid libraries were prepared from four YAC clones; 227B1, 195B5, 
A250D5, and 379C2. Genomic DNA from YACs was partially digested with Mbol 
and cloned into cosmid vector superCos 1 (Stratagene, La JoUa, CA) following the 
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manufacturer's recommendations. Clones containing human inserts were identified 
using radiolabeled sheared human DNA as a probe. 

<; y onfT rang^- r^f^tprtmn analysis 
5 YAC plugs were digested to completion using rare-cutter restriction 

enzymes as described by M.C. Wapenaar et al., Huni. Mol. Genet., 2, 947-995 
(1993) and analogously by G.A. Silverman et al., Proc, Natl- AP^d, Scj. U$A, M, 
7485-7489 (1989). Enzymes were purchased from New England Biolabs (Beverly, 
MA) and Boehringer Manheim Biochemicals (Indianapolis, IN) and were used as 
10 recommended by the manufacturer. All PFGE analyses were performed on a Bio- 
Rad CHEF apparatus under conditions that separate DNA fragments in the 50 kb to 
600 kb range. The gels were stained with ethidium bromide, and either acid nicked 
or subjected to 200,000 mJ of UV energy in a UV Stratalinker 1800 (Stratagene, La 
JoUa, CA). The gels were denatured in 0.4 N NaOH and transferred to Sure Blot 
15 hybridization membrane (Oncor, Gaithersburg, MD) in either lOxSSC (1.5 M 
NaCl/150 mM NaCitrate) or 0.4 N NaOH according to the manufacturer's 
recommendations. Hybridizations of the filters were carried out using the probes 
listed in Table 6 and Figure 6. Also pBR322 BamHl/PruU fragments of 2.5 kb and 
1.6 kb specific for the left (TRP/CEN) and right (URA) pYAC4 vector arms 
20 respectively, were used. Probes were radiolabelled using the random priming 
technique described by A.P. Feinberg et al.. Anal, Biochem,, 122, 266-267 (1984); 
repetitive sequences were blocked using sheared human placental DNA as 
previously described by P.O. Sealy et al.. Nucleic Acids Res.. H, 1905-1922 (1985). 

25 6 Diniicleo ti^f f'^pftat analvsis 

Primer sequences and PCR cycle conditions are presented in Table 6. 
Buffer conditions were the same as for Alu-?CR. All reaction volumes were 25 ^l 
and contained 40 ng of genomic DNA. One primer of each pair was labelled at the 
5' end with [y-^^P] dATP. Four microliters of each reaction was mixed with 2 \il 

30 formamide loading buffer, denatured at 90-100°C for 3 minutes, cooled on ice and 
4-6 nl was used for electrophoresis on a 4% polyacrylamide/7.65 M urea sequencing 
gel. 
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B. Results 

1 Generation of sequence t agged sites in 6p22-d23 and YAC screening 

Comparative analysis of the Alu-FCR products from the radiation 
hybrid, which retains D6S89 (R86) and from the four radiation hybrids deleted for 

> 5 D6S89 but retaining markers which flank D6S89 (R78, R72, R54 and R17) allowed 
the identification of three new DNA fragments that were present in R86 but absent 
in the other four. These three DNA fragments termed, AM 10, AM 12 and FLBl 
were isolated and mapped using a 6p somatic cell hybrid panel and the radiation 
reduced hybrid panel (H.Y. Zoghbi et al., QfflQmics, 2, 713-720 (1991)) to confirm 

^ 10 their regional localization. All three mapped to 6p and to R86 confirming their 
close proximity to the D6S89 locus. These three Alu-?CR fragments were 
subcloned and sequenced to establish sequenced tagged sites (STSs). STSs at 
AM 10, AM12, FLBl and D6S89 were used to screen the Washington University 
and the CEPH YAC libraries (H.M. Albertsen, et al., Proc. Natl. Acad. Sci. USA , 
15 82, 4256-4260 (1990); and B.H. Brownstein et al.. Seisms, 244, 1348-1351 (1989)). 
YACs isolated at these four STSs were analyzed for overlap. Insert termini from the 
YACs representing contig ends were isolated, subcloned and were sequenced to 
establish new STSs for further YAC walking. In one case an STS was established 
by using a subclone from a cosmid derived from a cosmid library generated for 
20 YAC 195B5. 

Recently several highly informative dinucleotide repeat markers have 
been identified and mapped genetically by J. Weissenbach et al.. Nature , 259 794- 
801 (1992). As discussed above, two markers, D6S274 and D6S288 were foimd to 
map within the SCAl critical region and were subsequently used to screen the YAC 
25 ' libraries. Using the STSs listed in Table 6, YAC clones were isolated. 

2. Characteri y^ion of YAC clones 

The sizes of the YAC inserts were determined by pulsed-field gel 
^ electrophoresis (PFGE); insert sizes ranged from 75-850 kb. Given the high 
30 frequency of insert chimerism, an ^/w-PCR based hybridization strategy for rapid 
^ ^ detection of chimerism, as described by S. Banfi et al.. Nucleic A cids Res. , 2Q, 1 814 
(1992) was used. Thirty of the YAC clones were tested using this approach and 
eight (27%) were found to be chimeric. Insert ends isolated from YACs determined 
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to be non-chimeric by the dot blot hybridization approach mapped to 6p22-p23 with 
the exception of the two ends from 198C8 which proved to map to other 
chromosomes. 

Two approaches were used, inverse-PCR (G, Joslyn et al., Cell , 66 , 
5 601-613 (1991)) and Alu-PCR (analogous to that described by D.L. Nelson et al., 
Proc. Natl. Acad. Sci. USA , 86, 6686-6690 (1989)) to isolate YAC ends. In total, 
34 YAC ends were isolated; inverse-PCR yielded 26 ends and ^/w-vector PCR 
yielded 8 ends. To isolate the left end of the 195B5 YAC we screened a cosmid 
library prepared from this YAC using pYAC4 left end sequences (S.K. Bronson et 

10 al., Proc. Natl. Acad. Sci. USA , M, 1676-1680 (1991)) as a probe. This approach 
was taken because inverse-PCR yielded an end which was predominantly an Alu- 
containing sequence and Alu-PCR failed in yielding an end. Cosmid clone A32 was 
found to contain the left end of 195B5 and a subclone, 64U, was used to establish an 
STS for fiirther YAC library screenings. 

15 In order to confirm the 6p22-p23 regional origin of all YAC ends or 

subclones, these fragments were used as probes against Southem blots containing 
£coRI-digested DNAs from a somatic cell hybrid retaining 6p (1-7), from radiation 
reduced hybrids known to retain fragments of 6p (H.Y. Zoghbi et al.. Genomics , 2, 
713-720 (1991)) and from the YAC clones at a particular STS. 

20 

3. Probe content mappjjpgQfYAC? 

In order to define the degree of overlap between the clones and to 
detect possible rearrangements such as internal deletions of the YACs, a probe 
content mapping strategy was used based on: 1) PCR analysis of all the clones using 

25 all the STSs in the region including both the ones described in Table 6, and those at 
highly informative dinucleotide repeats such as AMIO-GA and SBl; and 2) 
hybridization of Southem blots containing EcoRI-digested DNAs from YACs in the 
relevant region, with densely-spaced DNA probes derived from YAC ends, cosmids 
subclones of YACs, or Alu-PCR fragments from YACs. The results of this analysis 

30 for a representative subset of the YACs (32 clones) are summarized in Table 7. 
Thirty-nine YAC clones form an uninterrupted YAC contig from D6S274 to 82G12- 
R (right end of YAC clone 82G12). Other than an internal deletion in one YAC 
(35 IB 10) no other deletions were detected within the resolution of this analysis; 
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furthermore the extent of chimerism for some YAC clones (such as 270D12 and 
1 40H2) was determined. The centromere-telomere orientation of the YAC contig 
on 6p was determined using both genetic data as well as physical mapping data. 
Using dinucleotide repeats analysis at D6S109, AMIOGA, D6S89, and SBl in the 
5 key individual with recombination event between D6S89 and SCAl revealed that 
the recombination event occurred between AMIOGA and D6S89. Given that 
D6S109 is centromeric to D6S89, the recombination analysis suggests that 
AMIOGA is centromeric to D6S89. The centromere-telomere position of SBl with 
respect to D6S89 could not be determined genetically. 
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TABLE 7. 

Characterization of YA.Cs using 6p22-p23 STSs and YAC fragments 
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Note. (+) = present, (-) = absent; Y/N = chimerism is/not detected. YAC ends are identified bv YAC 
names followed by L or R for left or right. 
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Physical mapping, using both radiation hybrids and YACs, was carried 
out to resolve the centromere-telomere order of the loci. The radiation reduced 
hybrids R17 and R72 are known to contain markers centromeric to D6S89; these 
markers include D6S 108 and D6S88 which map centromeric to D6S 109. See, H.Y. 
5 Zoghbi et al., Genomics , 9, 713-720 (1991). R72 also retains D6S109, but a small 
gap in R17 was revealed as this radiation hybrid did not retain D6S109, but was 
positive for an end isolated from a YAC at the D6S109 locus. Analysis of the 
radiation reduced hybrids revealed that D6S274 and D6S288 are present in R17, 
R72 and R86, whereas AMIOGA, D6S89, and SBl are present only in R86 

10 (Figures). Furthermore, STS content mapping with D6S260 and D6S289, two 
dinucleotide repeats that are telomeric to D6S288 (J. Weissenbach et aL, Nature , 
359 794-801 (1992)), revealed that D6S260 is present in the same YACs as D6S89 
and SBl (379C2 and 168F1), and that D6S289 is present in 57G3 and 35E8 two 
YACs derived using the FLBl and AM 12 STS respectively. These data, confirm 

15 that the order of the loci as well as the centromere-telomere orientation of the YAC 
contig presented in Figure 6 is correct. 

Figure 6 shows a selected subset of YAC clones which span the entire 
contig from D6S274 to 82G12-R. A minimal number of 8 YACs spans this region. 
The positions of the STSs which were used to isolate the YACs are also shown. 

20 Based on the size of the YACs and the degree of overlap, this contig is estimated to 
span 2.5 Mb of genomic DNA in 6p22-p23 with D6S89 located approximately in 
the middle. 



4. Delineating these A 1 critical region 

25 Genetic studies using recently identified dinucleotide repeats 

(AMIOGA and SBl) showed that SCAl maps centromeric to the D6S89 locus very 
close to AMIOGA (peak load score of 42.1 at a recombination frequency of zero) in 
nine large SCAl kindreds (Example 1, above). Thus D6S89 is the closest flanking 
marker at the telomeric end. Previously, the closest flanking marker at the 

30 centromeric end was D6S109, a dinucleotide repeat estimated to be 6.7 cM 
centromeric to D6S89. To identify a closer flanking marker at the centromeric end, 
we mapped D6S260, D6S274, D6S288 and D6S289, four dinucleotide repeat- 
containing markers known to map 6p22-p23 (J. Weissenbach et ah. Nature , 359 
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794-801 (1992)). The regional mapping of these markers was done using radiation 
reduced hybrids and the YAC clones isolated from this region. These data revealed 
that D6S274 and D6S288 map centromeric to AMIOGA as evident by amplification 
of DNA from radiation hybrids R17 and R72 which are knovm to be centromeric to 
5 AMIOGA. Genotypical analysis of the DNAs from individuals with key 
recombination events between D6S109 and D6S89 as well as from affected and 
normal individuals (to establish chromosomal phase) from the five SCAl kindreds 
(MN-SCAl, MI-SCAl, TX-SCAl, M-SCAl and MS-SCA) was carried out. This 
analysis revealed no recombination between D6S288 and SCAl. A single 

10 recombination event between D6S274 and D6S288 was detected in individual MN-1 
from the MN-SCAl kindred (Figure 7); this individual was one of the six 
individuals identified above as having a recombination event between SCAl and 
D6S109. This analysis allowed us to identify D6S274 as the closest flanking 
marker at the centromeric end. These data combined with that discussed above 

15 determined that the SCAl critical region maps between D6S274 and D6S89. This 
candidate region (1.2 Mb) is cloned in a minimum of four overlapping and non- 
chimeric YACs as shown in Figure 8. 

5. Long-range restriction mapp in g 

20 In order to have an estimate of the size of the YAC contig in the SCAl 

critical region we performed long-range restriction analysis on YACs from this 
region. The YACs used for this analysis included: 227B1, 60H7, 351 BIO, 172B5, 
195B5, A250D5, 379C2, and 168FL The following rare-cutter restriction enzymes 
were used: Notl, BssHll, Nrul, Mlul, and 5acII. Restriction fragments separated by 

25 PFGE and transferred onto nylon membranes, were detected by sequential 
hybridizations of the filter to several DNA probes which included: DNA probes 
specific for the left and right arm of the pYAC4 vector; insert termini for internal 
YAC clones; intemal probes and cosmid subclones; and an ^/w-specific probe. The 
position and names of all the probes used in the long-range restriction analysis is 

30 shown in Figure 8. Based on this analysis the intemal deletion for YAC 35 IB 10 
was confirmed. The extent of overlap between the YAC clones was determined. 
The size of the critical SCAl region was estimated to be 1.2 Mb. Intemal deletions 
and/or other rearrangements could not be excluded for the areas where a single YAC 
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was analyzed by restriction enzyme analysis. These include approximately a 220 kb 
region within YAC 195B5 and a 335 kb region within YAC 379C2. 

III. . Expansion of an Unstable Trinucleotide Re peat in SCAl 
5 A, Methods 

1 . Screening for trinucleoti de repeats 

Genomic DNA from YACs was partially digested with Mbol and 
cloned into cosmid vector super CosI (Stratagene) following the manufacturer's 
protocol. Clones containing human inserts were identified by hybridization with 

10 radiolabeled human DNA and were arrayed on a gridded plate. Filter lifts of cosmid 
clones from YAC227B1 were screened for the presence of trinucleotide repeats by 
hybridization to [y-^^P] end-labelled (GCT), oligonucleotide. In a parallel 
experiment, a mixture of 10 oligonucleotides representing the various permutations 
of trinucleotide repeats were end-labelled and hybridized to a Southern transfer of 

15 £coRI-digested cosmids from YACs 195B5 and A250D5. Hybridizations were 
done in a solution of 1 M NaCl, 1% sodium dodecyl sulfate (SDS) and 10% (w/v) 
dextran sulphate. Filters were washed in 2xSSC (IxSSC is 0.15 M sodium chloride 
and 0.015 M sodium citrate), and 0.1% SDS at room temperature for 15 minutes, 
followed by a 15 minute wash at room temperature in a solution pre warmed to 

20 67^C. Both strategies identified several positive clones, 22 of which were 
overlapping and contained the same 3.36-kb EcoRI fragment which hybridized to 
the (GCT)7 probe and ultimately proved to have the CAG repeat by sequence 
analysis. 

25 2. Genomic digests and Southern blots 

Genomic DNAs were digested with Taql (Boehringer Mannheim, 
Indianapolis, IN) or BstNl (New England Biolabs, Beverly, MA) according to the 
manufacturers reconmiendations. Southern blotting was done following standard 
protocols. 

30 

3 DNA sequencing 

To determine the DNA sequence in the region containing and flanking 
the CAG trinucleotide repeats, clone pGCT-7, containing the 3.36 kh-EcoRL 
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fragment, was subcloned. A 400-bp fragment with CAG trinucleotide repeats was 
generated from pGCT-7 by Sau3Al digestion and subcloned into the Bamlil site of 
pBluescriptKS- (Stratagene, La Jolla, CA) (clone pGCT-7.sl). In addition, pGCT-7 
was digested with Pstl to remove 1.3 kb of DNA and recircularized for 

5 transformation (clone pGCT-7.p2). The position of the trinucleotide repeats was 
determined by PGR using (GCT)7 oligonucleotide and one of the flanking 
sequencing primers as PGR primers. Initial results indicated that the GAG _ 
trinucleotide repeats were on the reverse primer strand, about 1.3 kb from the 
reverse primer, that is, 400 bp from the Pstl site. DNA sequencing was performed 

10 by di-deoxy nucleotide chain-termination method using Sequenase and ATaq Cycle- 
Sequencing kit (United States Biochemical, Cleveland, OH). Both universal (-40) 
and reverse primers were used for clone pGCT-7,sl, while only universal (-40) 
primer was used for sequencing pGCT-7.p2. 

15 4. RT-PCR and Northern analysis 

Total RNA was extracted from lymphoblastoid cells using 
guanidinium thiocyanate followed by centrifiigation in a cesium chloride gradient. 
Poly (A)' RNA was selected using Dynabeads oligo(dT)25 from Dynal (Great Neck, 
NY). First strand cDNA synthesis was carried out using MMLV reverse 

20 transcriptase (BRL, Gaithersberg, MD). RT-PCR was carried out using hot start 
PGR with three cycles of: 97°C for 1 minute, 59°C for 1 minute, and 72''C for 1 
minute for the Prel and Pre2 primer set. Following that 33 cycles of 94*'C for 1 
minute, 57^C for 1 minute, and 72'*C for 1 minute were carried out. For the Repl 
and Rep2 primer pair the same PGR cycling conditions were followed at lower 

25 annealing temperatures of ST'C and SS'^C respectively. The RT-PCR products were 
analyzed on 6% Nusieve agarose gel. The northern blot containing various hxmian 
tissues was purchased from Clonetech (Palo Alto, CA). 

5. PCRAnalvsis 

30 Fifty ng of genomic DNA was mixed with 5 pmol of each primer 

(CAG-a/GAG-b or Rep-l/Rep-2) in a total volume of 20 \il containing 1.5 mM 
MgCl2, 300 \iM dNTPs (1.25 mM MgClj and 250 ^iM dNTPs for Rep-l/Rep-2 
primers), 50 mM KCl, lOmM Tris-HCl pH 8,3, and 1 unit of Amplitaq (Perkin 
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Elmer, Norwalk, CT). For the CAG-a/CAG-b primer pair [a-^^P]dCTP was 
incorporated in the PGR reaction, for Rep-l/Rep-2 primer pair the Rep-1 primer was 
labeled at the 5' end with [y-^^PldATP. Formamide was used at a final 
concentration of 2% when using the Rep-l/Rep-2 primer pair. Samples, overiaid 
5 with mineral oil, were denatured at 94°C for 4 minutes followed by 30 cycles of 
denaturation (94°C, 1 minute), annealing (55°C, 1 minute), and extension (72°C, 2 
minutes). Six microliters (^1) of each PGR reaction was mixed with 4 j^l formamide 
loading buffer, denatured at QO^'G for 2 minutes, and electrophoresed through a 6% 

- polyacrylamide/7.65 M urea DNA sequencing gel. Allele sizes were determined by 
^ 10 comparing migration relative to an Ml 3 sequencing ladder. 

R. Results 

1 Ginning of th e GAG repeat region in SGAl 

As discussed above, in efforts to clone the SGAl gene, key 

15 recombination events were analyzed using several dinucleotide repeat 
polymorphisms mapping to 6p22-p23 to identify the minimal region likely to 
contain the SGAl gene. This analysis revealed that there were no recombination 
events between SGAl and the centromeric marker D6S288 in five large kindreds or 
between SGAl and the telomeric marker AMIOGA in nine large kindreds. A single 

20 recombination event was detected between D6S274 and D6S288 identifying the 
closest flanking marker at the centromeric end to be D6S274. At the telomeric end, 
a single recombination event was detected between AMIOGA and D6S89 and 
identified the latter as the flanking marker. A yeast artificial chromosome (YAG) 
contig extending from D6S274 to D6S89 and spanning the entire SGAl candidate 

- 25 region was developed. A subset of the YAG clones encompassing this region is 

shown in Figure 9, Long-range restriction analysis determined the size of the SGAl 
candidate region to be approximately 1.2 Mb. Gosmid libraries were constructed 
from YAGs 227B1, 195B5, A250D5, and 379G2. Arrays of cosmid clones 
containing human inserts were hybridized v^th an oligonucleotide consisting of 
30 tandemly repeated GAG, as well as v^th oligonucleotides containing other 
trinucleotide repeats. Several hybridizing cosmid clones were identified, 23 of 
which were positive for the GAG repeat and mapped to the region between D6S288 
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and AMIOGA (Figure 9). All 22 of these clones shared a common 3.36-kb £:coRI 
fragment that specifically hybridized to the CAG repeat. 

2. Variability of the GAG Rep eat Using Southern Analysis 

5 To test the genetic stability of this repeat in SCAl, we used Southern 

blotting analysis to examine families with juyenile onset SCAl. A two-generation 
reduced pedigree from the TX-SCAl family is shown in Figure 10a. Paternal 
transmission of SCAl with an expansion of a TaqI fragment was noted. A 2830-bp 
fragment was detected in DNA from the unaffected spouse and on the normal 

10 chromosome from SCAl patients, whereas a 2930-bp fragment was found in DNA 
from the affected father (onset at 25 years) and a 3000-bp fragment was detected in 
DNA from his affected child with an onset at 4 years. In a second SCAl kindred, 
family MN-SCAl (Figure 10b), two offspring inherited SCAl from their father and 
differed in their age at onset (25 years and 9 years). These indiyiduals also differ in 

15 the size of the amplified Tdgl fragment they inherited from their affected father, 
2900-bp and 2970-bp, respectively. 

Enlargement of the (CAG)n-containing fragment on SCAl 
chromosomes from the same TX-SCAl juyenile onset family was also demonstrated 
by Southern analysis following BsfNl digestion. The BstNl fragment is 530-bp on 

20 normal chromosomes, is 610-bp in the SCAl affected father, and is 680-bp in the 
affected juyenile onset offspring (Figure 10c). In each of these families, 
nonpaternity was excluded by genotypic analysis with a large number (greater than 
10) of dinucleotide repeat markers. In addition, the size of the (CAG)n-containing 

Taql fragment in DNA from 30 unaffected spouses was compared to the sizes of the 
25 repeat containing Taql fragment in DNA from 62 indiyiduals affected with late- 
onset SCAl. The affected indiyiduals are from fiye different SCAl families: LA- 
SCAl, MI-SCAl, MN-SCAl, MS-SCAl, and TX-SCAl. In all 30 unaffected 
spouses fragment sizes were approximately 2830-bp and no expansions or 
reductions were detected with transmission to offspring. In contrast, DNA from 58 
30 of the 62 SCAl affected indiyiduals contained detectably expanded Taql fragments 
ranging in size from 2860-bp to 3000-bp in addition to the 2830-bp fragment. The 
DNAs from the remaining four indiyiduals were found to haye an expansion when 
analyzed by polymerase chain reaction (PCR). The expanded fragment always 
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segregated with disease, and in some cases the fragment expanded further in 
successive generations. In the juvenile cases the expanded restriction fragment was 
larger than that in the affected parent (uniformly the father in the cases analyzed) 
supporting the conclusion that a DNA sequence expansion is the mutational basis of 
5 SCAl. - 

i Genomic DNA analysis of repeat regions 

To identify the region involved in the DNA expansion, a 500-bp 
(CAG)n-containing subclone of the 3,36-kb EcoRl fragment was sequenced, as was 
10 the entire 3.36-kb fragment (Figure 1). This normal allele demonstrated 30 GAG 
repeat units. In two of the repeat units (position 13 and 15) a T was present instead 
ofaG. 

The expansion of the trinucleotide repeat was observed in all affected 
individuals examined by PGR from five different kindreds representing at least two 
15 ethnic backgrounds, American Black and Caucasian. Genotypic analysis using 
DNA markers that are very closely linked to SCAl (D6S274, D6S288, AMIOGA, 
D6S89 and SBl) revealed that there are four haplotypes segregating with disease 
among the five families analyzed. 

20 4L The trinucleotide repeat is transcribed 

To test whether the GAG repeat lies within a gene, reverse 
transcription-PCR (RT-PCR) was performed using primers immediately flanking 
the repeat (Repl and Rep2) as well as primers which amplify a sequence 
immediately adjacent to the repeat (Prel and Pre2). The RT-PCR analysis confirms 

25 that the GAG repeat is present in mRNA from lymphoblasts. Furthermore, northern 
blot analysis of human poly(A)"RNA firom various tissues, using a 1.1 kb subclone 
(C208-1.1) fi^om the 3.36-kb EcoKL fragment as a probe, identified a 10 kb transcript 
which is expressed in brain, skeletal muscle, placenta and to a lesser extent in 
kidney, lung and heart. The expression of this transcript is considerable in skeletal 

30 muscle. When die 3.36-kb EcoRL fragment was used as a probe on the northern blot 
the same size transcript was detected. 
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S PCR analysis of the CAG repeat 

To confirm that the CAG repeats were involved in the observed length 
variation, we analyzed the size of PCR-amplified fragments in 45 unaffected 
spouses and 31 SCAl affected individuals using synthetic oligonucleotides that 
5 flank the CAG repeat. One pair of primers (CAG-a/CAG-b) was located within 
9-bp of the repeats and identified length variation indicating that the CAG repeats 
are the basis of the variation. 

Normal individuals displayed 11 alleles ranging from 25 to 36 repeat 
units (Table 8). Heterozygosity in normal individuals was 84%. Examination of 

10 this sequence in 31 individuals affected with SCAl demonstrated that each was a 
heterozygote with one allele within the size range seen in the normal individuals and 
a second expanded allele within a range of 43 to 81 repeat units (Figure 1 1). Late 
onset SCAl individuals showed at least 43 repeats, while 59-81 units were found in 
the juvenile cases. Figure 12 depicts correlation between the age-at-onset and the 

15 number of the repeat units. A linear correlation coefficient (r) of -0.845 was 
obtained indicating that 71.4% (r^) of the variation in the age-at-onset can be 
accounted for by the number of (CAG)n repeat units. The largest trinucleotide 
repeat expansion was noted in SCAl patients with juvenile onset who typically had 
a more rapid course. It is of interest that all of these patients were offspring of 

20 affected males, which is reminiscent of Huntington disease where there is 
preponderance of male transmission in juvenile cases. 
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Sequence analysis of the fragment containing the CAG repeat 
indicated that there are several extended open reading frames. Translation of the 
repeat in one of these frames (389-bp) would encode polyglutamine. 

Table 8> 

Comparison of the number of CAG repeat units 
on normal and SCAl chromosomes 



Number 



Normal Chromosomes 



SCAl Chromosomes 



20 



Repeats 


Number 


Frequency 


Number 


Frequency 


>60 


0 


0 


4 


0.13 


50-59 


0 


0 


17 


0.55 


43-49 


0 


0 


10 


0.32 


37 - 42 


0 


0 


0 


0 


35-36 


1 


0.01 


0 


a 


30-34 


49 


0.55 


0 


0 


<29 


40 


0.44 


0 


0 


TOTAL 


90 


1.00 


31 


1.00 



25 lY. Tsolation ofSCAl cDNA 
A. Methods 

1 . Screenin g of cDNA libraries. 

Three cDNA libraries were screened: a human fetal brain library 
from Stratagene (La JoUa, CA), a human fetal brain library constructed in X-Zap II 
30 with the inserts cloned into the Not\ restriction site (provided by Dr. Cheng Chi Lee 
at Baylor College of Medicine), and an adult cerebellar cDNA library from 
Clonetech (Palo Alto, CA). The libraries were plated on 150 cm plates at a density 
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of 50,000 pfu per plate using bacterial strain LE392 (ATCC number 33572). 
Hybond-N filters (Amersham, Arlington Heights, IL) were used to carry out plaque 
lifts. The fragments used as probes in the first screening included a mixture of two 
polymerase chain reaction (PGR) products obtained by using the primers Repl and 

5 Repl (Figure 3) immediately flanking the repeat and the primers Prel and Pre2 
(Figure 3) which amplify a sequence immediately adjacent to the repeat, and a 1.1 
kb subclone of the 3.36-kb EcoRI fragment (Figure 1). The LI kb fragment (C208- 
lA) is located 540 bp 3' to the GAG repeat. A 9-kb EcoKl genomic fragment 
derived from the same cosmids containing the GAG repeat was also used in this 

10 screening. Subsequent rounds of screening were carried out on the same libraries 
using as probes cDNA clones 31-5, 3 J, 3c7-2 and 3c7 (Figure 13). Genomic and 
cDNA probes were labeled using the random priming technique described in A.P. 
Feinberg et al.. Anal. Biochem. , 122, 266-267 (1984), Repetitive sequences were 
blocked as described in P.G. Sealy et al., Nucl. Acids Res.. 12, 1905-1922 (1985). 

15 Briefly, the probes were reassociated with a large excess of shear human placental 
DNA. The nonrepetitive regions remained single-strsmded and no separation of the 
single-stranded fragments from the reassociated fragments was necessary in order to 
allow the signal from low copy number components to be detected in subsequent 
transfer hybridizations. Hybridization of the filters was then carried out following 

20 standard protocols as described in H.Y. Zoghbi, et al., Arn. h Hum. Qpnet, 42, 877- 
883 (1988). 

2. DNA seq uencing and sequence analysis. 

Shotgun libraries were constructed in Ml 3 as described in A.T. 

25 Bankier, et al., Meth. Enzvmol.. m 55-93 (1987) for each of the following cDNA 
clones: 8-8, 31-5, 3c5, 3c7-l, 3J, 3c7-2, 3c7 (Figure 13). Twenty to thirty M13 
subclones were sequenced for each cDNA clone using an Applied Biosystem, ABI 
370A, automated fluorescent sequencer, as described in R. Gibbs, et aL, Proc. Natl. 
Acad, Sci. U.S.A. . 1919-1923 (1989). Some cDNA clones (8-9b, 8-9a, AXl, 

30 B21, Bl 1, 3c28) were partially sequenced manually using a Sequenase sequencing 
kit (USB, Cleveland, OH) on double-stranded templates, according to the 
manufacturer's recommendations. The sequence coverage in terms of nximbers of 
cDNA/genomic clones analyzed was 3-4X in the coding and 5'UTR and 2X in the 
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3'UTR. All RT-PCR, 5'-RACE-PCR and inverse-PCR products were sequenced 
manually after subcloning into Smal-digested pBluescript SK- plasmid (Stratagene, 
La JoUa, CA) modified using the T-vector protocol as described in D. Marchuk et 
al., Nucl. Acids Res.. IS, 1154 (1990). Use of this protocol facilitates cloning. 
5 Briefly, Tag polymerase ordinarily causes a template-independent addition of 
adenosine at the 3' end of the PCR product, making blunt end ligations difficult. In 
the T-vector protocol, a thymidine is added to the 3' end of a digested plasmid. The 
result is a one-base sticky end complementary to the 3' adenosine in the PCR 
product, which greatly increases cloning efficiency. 
10 Data base searches were carried out using the GCG software package 

(Genetics Computer Group, Madison, WI) and the BLAST network service from the 
National Center for Biotechnology Information (S.F. Altschul, et aL, J. MoL Biol. . 
215 . 403-410 (1990)). The sequence of the SCAl transcript has been deposited in 
Genbank, accession number X79204. 

15 

3. Northern blot. RT-PCR and genomic PCR analyses. 

The northern blot of poly-(A)^ RNA from various human tissues and 
the poly-(A)^ RNA from adult human cerebellum were purchased from Clonetech 
(Palo Alto, CA). Poly-(A)^ RNA from human lymphoblastoid cells was prepared by 

20 first extracting total RNA using guanidinium thiocyanate, followed by 
centrifugation in a cesium chloride gradient (P. Chomczynski et al.. Anal. Biochem .. 
Ij^, 156-159 (1987)). Poly-(A)^ RNA was selected using Dynabeads oligo (dT)25 
from Dynal (Great Neck, NY). First strand randomly primed cDNA synthesis was 
carried out using MMLV (murine maloney leukemia virus) reverse transcriptase 

25 (BRL, Gaithersberg, MD). This was conducted in a 20 jal reaction mixture 
containing 3 ^g RNA, first strand buffer (50 mM Tris-HCl, pH 8.3, 75 mM KCl, 3 
mM Mg CI2), (BRL, Gaithersberg, MD), 10 mM dithiothreitol (BRL, Gaithersberg, 
MD), 1 fiM 3' end primer, 0,5 units RNasin (Promega, Madison, WI), 5.0 xmits 
MMLV reverse transcriptase (BRL, Gaithersberg, MD), 250 |iM each 

30 deoxynucleotide triphospate: dGTP, dATP, dCTP, dTTP. The mixture was 
incubated for 20 minutes at 37*'C then put on ice. A 10 ^1 aliquot was used for the 
PCR reaction. First strand randomly primed cDNA from human brain, liver and 
adrenal were provided by Dr. G. Borsani (Baylor College of Medicine). 
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RT-PCR for detection of alternative splicing was carried out with 
primers 9b and 5R and with primers 5F and 5R (Figure 15) under the following 
conditions: an initial denaturation step at 94°C for 5' followed by 30 cycles of 94°C 
for 1 minute, 60^C for 1 minute and 72^C for 2 minutes. The reaction mixture 
5 contained 10 ^il cDNA, PGR buffer (50 mM KCL, 10 mM Tris-HCl, pH 8.3, 1.25 
mM MgCy, 1 |LiM of the relevant 3' primer (primer 5R), 2% formamide and 1.25 
units Amplitaq enzyme (Perkin Elmer, Norwalk, CT). 

RT-PCR on lymphoblastoid cell lines with primers Repl and Rep2 
for detection of expression of SCAl mRNA was carried out using "hot start" PGR 
10 with three cycles of: 97*'C for 1 minute, 57°G for 1 minute and 72°C for 1 minute. 
Following that 33 cycles of 94*=*C for 1 minute, 55°G for 1 minute and 72*'G for 1 
minute were carried out. Twenty microliters of the PGR reactions was then resolved 
on a 2% agarose gel (2 g Ultrapure agarose (BRL, Gaithersberg, MD) in 40 mM 
Tris-acetate, 1 mM EDTA, pH 8.0) and blotted onto Sureblot membrane (Oncor, 
15 Gaithersburg, MD). The filter was hybridized with a (GGT)7 oligonucleotide 
end-labeled with y-^^P-ATP, Hybridizations were done in a solution of 1 M NaGl, 
1% sodium dodecyl sulfate (SDS) (Sigma Ghemical Gompany, St. Louis, MO) and 
10% (w/v) dextran sulphate (Sigma Ghemical Gompany, St. Louis, MO). Filters 
were washed in 2 x SSG (1 x SSG is 0.15 M sodium chloride and 0.015 M sodium 
20 citrate), and 0.1% SDS at room temperature for 15 minutes, followed by a 15 minute 
wash at room temperature in a solution prewarmed to 67°G. 

B. Results 

Two human fetal brain cDNA libraries were screened using as probes 
various DNA fi*agments from the cosmid clone shown to contain the GAG repeat. 
Five cDNA clones were identified; these included clone 31-5 containing the GAG 
repeat, and clone 3 J which was found not to overlap with 31-5 (Figure 13), 
Northern blot analysis revealed that clones 31-5 and 3 J identified the same 11 -kb 
transcript detectable in all tissues examined (Figure 14). Accordingly, the same two 
human fetal brain cDNA libraries and a human adult cerebellar cDNA library were 
used for several rounds of screening in order to obtain the full length transcript. As a 
result, 22 cDNA clones were isolated and characterized by sequence and PGR 
analyses to assemble a contig spanning the SGAl transcript. Twelve of the phage 



25 



30 
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clones spanning the cDNA contig are shown in Figure 13. These clones were 
sequenced allowing the assembly of the entire sequence of the SCAl cDNA which 
spans 1 0,660 bp (Figure 1 5). 

Sequence analysis revealed a coding region of 2448 bp starting with 
5 a putative ATG initiator codon at base 936 located within a nucleotide sequence that 
fulfills Kozak's criteria for an initiation codon (M. Kozak, J. Cell. Biol. . 108 , 229- 
241 (1989)). An in-frame stop codon is present 57 bp upstream of that ATG in three 
independent cDNA clones as well as in genomic DNA. Furthermore, both the ATG 
at the beginning of the coding region and the upstream stop codon have been found 

10 in the murine homologue of SCAl in the murine fetal brain library (Stratagene, La 
JoUa, CA). The SCAl gene therefore encodes a polypeptide of about 816 amino 
acids, with an expected size of 87 kD, designated ataxin-l. However, one cannot 
exclude the possibility that the coding region begins at any of the other ATGs, 
located downstream of the first methionine, which would result in a smaller protein. 

15 The CAG repeat is located within the coding region 588 bp from the 

first methionine and encodes a polyglutamine tract. The open reading frame ends 
with a TAG stop codon at base 3384. Therefore, this transcript has a 5' untranslated 
region (5'UTR) of 935 bp and a 3' untranslated region (3'UTR) of 7277 bp. The 
transcript ends with a tail of 57 adenosine residues; a polyadenylation signal, 

20 AATAAA, is found 23 nucleotides upstream of the poly(A) tail. Homology 
searches using both the DNA sequence of the coding region and the predicted 
protein sequence (lacking the CAG repeat and the polyglutamine tract, respectively) 
revealed no significant homology with other known proteins in the data base. 
Analysis of the sequence of ataxin-1 fmled to reveal the presence of any strong 

25 phosphorylation sites as well as any specific motifs such as DNA binding or RNA 
binding domains. The putative secondary structure of this protein is compatible 
with that of a soluble protein as no hydrophobic domains were identified. A DNA 
sequence data base search revealed an identity between 380 bp in the 3'UTR of the 
SCAl transcript and an expressed sequence tag (EST04379) isolated from a human 

30 fetal brain cDNA library (M.D. Adams, M.D. et ah. Nature Genet ., 4, 256-267 
(1993)). 
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V. Or ganization of the SCAl Transcript: Evidence for Alternative 

Splicing in theS UTR 
A, Methods 

1. 5'-RACE-PCR 

First strand cDNA was prepared from 1 mg of poly-(A)'^ RNA from 
human adult cerebellum (Clonetech, Palo Alto, CA) using the primer 5R (Figure 15) 
as described in Example IV. 5'-RACE-PCR was carried out as described in M.A. 
Frohman in PGR Protocols. A Guide to Methods and A pplications ! M.A. Innis, et 
al., Eds.; Academic Press: San Diego (1990) using SCAl primers 5a and X4-1 
(Table 9) as specific primers. The product was then electrophoresed through a 1 .2% 
agarose gel, blotted onto SureBlot hybridization membrane (Oncor, Gaithersburg, 
MP) as described in Example II above, and then, to test the specificity of the 
product, hybridized to a SCAl specific probe represented by a PCR product 
spanning 1 18 bp between primer 9b in exon 1 and primer X3-1 (Table 9) in exon 3. 



Primer sequences for inverse-FCR 



Exon 


Primer 1 


Primer 2 


2 


X2-1 (181-164) 
GTAGTAGTTTTTGTGAGG 


X2-2 (185-203) 
CACCAAGCTCCCTGATGGA 


3 


X3-1 (246-229) 
GCTTGAATGGACCACCCT 


X3-2 (277-296) 
ATCTCCTCCTCCACTGCCAC 


4 


X4-1 (347-329) 
AGACTCTTTCACTATGCTC 


X4-4 (407-425) 
TTCAGCCTGCACGGATGGT 


5 


5a (482-463) 

TGGCAGTGGAGAATCTCAGT 


5-2 (519-538) 

TGCTGCAAGGAACTGATAGC 


6 


10a (598-580) 

AATGGTCTAATTTCTTTGG 


10b (607-625) 

GAGAAAGAAATCGACGTGC 


7 


6-1 (714-695) 

ACAGGCTCTGGAGGGCTCCT 


X5-2 (723-742) 

TCCATGGTGAAGTATAGGCT 


9 


9-1 (2919-2900) 
AGCAGGATGACCAGCCCTGT 


9-2 (2939-2957) 
GCTCTTTGATTTGCCGTGT 


AH primers are read in the 5' to the 3' direction. Nvimbers in parenthesis represent 
the coordinates of each primer vsdthin the SCAl cDNA sequence (Figure 15). 
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B. Results 

To characterize the genomic region flanking the CAG repeat, the 
3.36-kb EcoRl genomic fragment known to contain this repeat was completely 
sequenced. Alignment of this genomic sequence with the cDNA sequence allowed 
5 us to determine that the 3.36-kb EcoRi fragment contains a 2080-bp exon which has 
160 bp of 5'UTR, the first potential initiation codon and the first 1920 bp of the 
coding region. The rest of the coding region lies within the next downstream exon 
as detected by PCR analysis on genomic DNA. The last coding exon, which maps 
to a 9-kb EcoEl fragment in genomic DNA also contains 7277 bp of 3'UTR for a 
1 0 total length of 7805 bp (Figure 1 6a), 

Evidence for alternative splicing in the 5'UTR was initially 
suggested based on the hybridization pattern of the two most 5' cDNA clones, 8-8 
and 8-9b (Figure 13) to Southern blots containing £coRI-digested genomic DNA 
from total human DNA and YACs spanning the SCAl region. At least three 
15 strongly hybridizing fragments in addition to the 3.36-kb EcoRI fragment were seen. 
As neither of the cDNA clones contains an £coRI site, this result suggested the 
presence of several exons in the 5'UTR of the SCAl transcript. Given these data 
and the unusual length of the 5 'UTR, this region w^as characterized in more detail. 

Alignment analysis of the sequence of clones 8-8 and 8-9b revealed 
20 the presence of two different 5' sequences diverging at basepair 322. This result 
was highly suggestive of alternative splicing. In order to test this hypothesis, 
reverse transcription-PCR (RT-PCR) was performed on mRNA from cerebellar 
tissue using the primers indicated in Figure 15. When the primers 9b (specific for 
8-9b clone) and 5R (present in both clones) were used in the RT-PCR analysis three 
25 products were obtained: one of the expected size (246 bp) and at least two fi-agments 
of larger size (Figure 166). The same result was obtained when RT-PCR was 
carried out on liver, adrenal, brain and lymphoblast cDNAs. The various RT-PCR 
products were cloned and sequenced. Sequence analysis of all these products and 
comparison vnth the sequence of phage clones 8-8 and 8-9b confirmed that they 
30 were the result of alternative splicing. Figure 16a shows the structure of all the 
cDNA clones which contain the 5' exons of the SCAl gene and depicts the splice 
variants. Based on sequence analysis of three cDNA clones and characterization of 
cerebellar RT-PCR products, five exons (exons 1 through 5) were identified and 
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their borders in the transcript were determined. Exons 2, 3 and 4 are alternatively 
spliced in the clones examined and in cerebellar tissue, whereas exon 5 was present 
in all the cDNA clones and RT-PCR products. 

Rescreening of cDNA libraries with clones 8-8 and 8-9b as probes 
5 did not yield any additional cDNA clones. To identify additional alternatively 
spliced exons in the 5'UTR and to confirm initial results, 5'-RACE-PCR was 
carried out on reverse transcribed cerebellar mRNA using primers from the 5' end of 
exons 5 and 4. A 218-bp product was identified and its specificity was confirmed 
by Southern analysis using an internal PGR product as probe. Sequence analysis of 
10 the 5'-RACE-PCR product, furthermore, confirmed the alternative splicing of two 
exons (2 and 3) and allowed the identification of an additional 127 bp at the 5' end 
of this gene (Figure 16a), 

VI. Tdenrification of Intron-Exon Boundaries and Determination 
15 of the Geno mic Structure of SCAl 

A. Methods 

1 . Identification of intron-exon boundaries 

The boundaries of exons 2-9 were identified by inverse-PCR. To 
carry out inverse-PCR, YAC agarose plugs were digested to completion as 

20 described in M.C. Wapenaar, et al.. Hum. MoL Genet.. 2, 947-952 (1993) using 
frequent-cutter restriction enzymes such as iSawSal, Taql, Haelll and Mspl 
purchased from Boehringer Mannheim Biochemicals (Indianapolis, IN) and used as 
recommended by the manufacturer. The plugs were then digested with P agarase I 
(USB, Cleveland, OH) following the manufacturer's recommendations and 

25 subsequently phenol-chloroform (Boehringer Mannheim Biochemicals, 
Indianapolis, IN) extracted, precipitated with ethanol and resuspended in 12 ml of 
TE (TE: 10 mM Tris-HCl, 1 mM EDTA) pH 8. Fifty ng of DNA from each digest 
was then circularized according to the published protocol of J. Groden et al.. Cell . 
66, 589-600 (1991). Diverging PCR primers were designed within the cDNA and 

30 used on the circularized product under the amplification conditions described in J. 
Groden et al.. Cell . 589-600 (1991). PCR products were then subcloned and 
sequenced as described in Example II, above. Inverse-PCR identified all 
intron/exon boundaries except the boundary of exon 1 . Accordingly, a 9-kb EcoRl 
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genomic fragment found to contain exon 1 was subcloned from a cosmid derived 
from YAC 227B1. (Example II). This subclone was subsequently partially 
sequenced to identify the boundary of exon 1 . 

5 7 Mapping of cDNA clo nes to the YACs and cosmids. 

Southern blots containing £coRI-digested DNAs from YACs 
spanning the SCAl critical region as well as Southern blots containing DNAs from 
the YACs digested with rare-cutter enzymes (see previous section) were hybridized, 
using the standard protocol described in H. Y. Zoghbi et al.. Am. J, Hum . Genet.. 42, 
: 10 877-883 (1988), to various SCAl cDNA clones and to all the genomic fragments 
containing the intron-exon boundaries. Briefly, restriction fragments were separated 
by electrophoresis on 0.7% agarose gels, denatured and transferred to Nytran 
(Schliecher and Schuell, Keene, NH) filters. Probes were ^^P-labeled using the 
oligohexamer labeling method (A.P. Feinberg et al. Anal. Biochem.. 122, 6-13 
15 (1983)). After hybridization the filters were washed and autoradiography was 
performed, as described in Zoghbi et al.. Am, I Hum. Q^mU 42, 877-883 (1988). 

B, Results 

Complete sequencing of the 3.36-kb EcoRl fragment provided the 
20 intron-exon boundaries for the 2080-bp exon containing most of the coding region 
(Figure 17). In order to determine the actual number of exons and to obtain all of 
the intron-exon boundaries, an inverse-PCR strategy was adopted using two 
overlapping YAC clones, 227B1 and 149H3, known not to contain any 
rearrangements (see Example II). A total of nine exons, seven of which are in the 
25 5'UTR, were identified and splice junctions for exons 1 through 9 were subcloned 
and sequenced (Figure 17), The schematic on top of Figure 16a shows the nine 
exons £uid their respective sizes. In the 5' untranslated region, alternative splicing 
involves exons 2, 3 and 4, but not exons 5, 6 and 7 in over 5 phage cDNA clones 
analyzed. The putative exon 1 encompasses 157 bp and hybridizes very strongly to 
30 an £coRI fragment derived from hamster genomic DNA. 

To study the genomic organization of the SCAl gene, ten cDNA 
clones as well as genomic fragments containing the splice jxmctions for all the exons 
were mapped by Southern analysis and localized on a long range restriction map of 
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four overlapping YAC clones spanning the SCAl critical region (Figure 18). This 
analysis revealed that the gene spans at least 450 kb of genomic DNA and that the 
putative first exon maps to a genomic fragment containing a hypomethylated CpG 
island. Detailed restriction analysis of the intron between the two coding exons (8 
5 and 9) revealed that this intron is approximately 4.5-kb in length. The sizes of the 
remaining introns were estimated from the long range restriction map and by PCR 
analysis and ranged from 650 bp (intron 2) to nearly 200 kb (intron 7) (Figure 1 8). 

VIL Firpressinn of the SCAl mRNA in SCAl Patients 

]0 As a first step toward understanding the mechanism by which the 

expansion of a trinucleotide CAG repeat leads to neurodegeneration in SCAl, the 
level of transcription of SCAl from the expanded alleles in patients was 
investigated. RT-PCR was carried out v^th primers Repl and Rep 2 which flank the 
CAG repeat as described in Example V using lymphoblastoid mRNAs from SCAl 

15 patients with repeat sizes ranging from 43 to 69. This analysis revealed that mRNA 
was expressed from both the normal allele and the expanded allele (Figure 1 9). 

VIII. rioning of portions of th e SCAl Gene into the nMAL^^-2 Vector 

DNA from the SCAl gene was cloned into the pMAL^M-c2 vector 
20 (New England Biolabs, Beverly, MA), which produces a chimeric protein consisting 
the maltose-binding protein fused to the N-terminus of the protein of interest 
(ataxin-1) in a linkage that can subsequently be conveniently cleaved. To obtain 
DNA for cloning, SCAl DNA was amplified and isolated clone 31-5 (Figure 13) 
using standard PCR techniques. The manufacturer's instructions were followed in 
25 designmg the appropriate oligonucleotide primers (pMAL"^"^ vector Package Insert, 
1992 New England Biolabs, revised 4/7/92). In each case an EcoKl linker site was 
designed into the 5' primer and a ///ndlll linker site was designed into the 3' primer 
to facilitate cloning. Three different amplification products were obtained. In one, 
DNA was isolated utilizing two 20-nier PCR primers COD and RCOD (Table 10) 
30 that hybridized to the 5' and 3' ends of the coding regions, such that the stretch of 
DNA being amplified contained residues presumed to encode the entire sequence of 
ataxin-1, beginning with Metl and ending with Lys 817 (Figure 15). The amplified 
product was than cloned into the £^coRI///mdIII site in the polylinker region of in 
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pMAL'^^*-c2 following instructions provided by the manufactvirer. Two other 
constructs were made in the same way using PGR to isolate shorter segments of 
DNA. In both cases the same 3' end primer was used, but different 5' primers were 
employed (Table 10). One 5' primer (3COD) was designed such that the amplified 
5 product began at Met277 (the fourth methionine in the coding region), the other 5' 
primer (8COD) such that the amplified product began at Met548. pMAL'^^«-c2 was 
transformed into competent cells containing a lacZAMl 5 

allele for a-complementation and cultured as reconmiended by the manufacturer. 



^ 10 Table 10. 

Primers for Cloning Into pMal Vector 

Primer Name Nucleotide Sequence 

COD TGT GAA TTC ATG AAA TCC AAC CAA GAG CG 

3COD TGT GAA TTC ATG ATC CCA CAC ACG CTC AC 

8COD TGT GAA TTC ATG GTG CAG GCC CAG ATC 

RCOD TTC GAA GCT TCT ACT TGC CTA CAT TAG AC 



15 IX, Kxpression of Ataxin- 1. Design of Antigenic Peptides and 
Production of Antibodies 

The fusion protein expressed by the constructs in Example VII were 

purified as directed by the manufacturer using affinity chromatography (pMAL™ 

vector Package Insert, 1992 New England Biolabs, revised 4/7/92). The purified 

20 protein was electrophoresed using 8% SDS poiyacrylamide electrophoresis and 
electroeluted. The best expression (about 27 mg firom 1 L of cells) was obtained 
firom the shortest construct, but all constructs produced measurable levels of protein 
of a size consistent with their respective cloned gene product. 

Antibody response in rabbits was initiated using the multiple 

25 antigenic peptide strategy of V. Mehra et aL. Proc. Natl. Acad. Sci. USA , R^. 7011- 
7017 (1986). In addition to the three electroeluted cloned gene products described 
in the preceding paragraph, three synthetic peptides were used as well. The 
synthetic peptides used were Peptide A (amino acids 4 through 18), Peptide B 
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(amino acids 162 through 176) and Peptide C (amino acids 774 through 788). 
These peptides were chosen such that they showed little or no homology with other 
known short amino acid stretches in proteins and also such that they contained 
proline, which makes it more likely that these fragments are located on the surface 

5 of the protein, thus making it more likely that antibodies to the fragments will react 
with the whole protein as well. 

Immunoglobulin (IgG) from rabbit blood was purified, and 
antibody/antigen results were analyzed using Westem blots as described in Gershoni 
et aL, Anal. Bioch. . 131 . 1-15 (1983). IgG from rabbits injected with the cloned 

10 gene products and the synthetic sequences were found to hybridize to their 
respective antigens. The anti-sera from rabbits immunized with the 8C0D-RC0D 
gene product (i.e., the ataxin-1 fragment spanning residues 548 through 817) 
hybridized with a protein of the expected size in brain tissue extracts from mouse, 
rats, and humans. A similar size protein has also been detected using lymphoblasts; 

15 This hybridization is blocked by preincubation with the polypeptide antigen, and not 
blocked by unrelated antigens. In particular, antibodies raised against Peptide C are 
blocked by either Peptide C or the short gene product. 

X. Molecular and Clinical Correlations in Spinocerebellar ataxia 

20 typ^ X ($CAJi) 

A, Materials and Methods 
1 Family Material 

Members representing 87 kindreds with dominantly inherited ataxia 
were evaluated. Nine kindreds of diverse ethnic background (Caucasian American, 

25 African American, South African, Siberian lakut) were already known to have 
SCAl based on linkage to the HLA locus and to D6S89 on chromosome 6p. 
Genotypic analysis of the SCAl CAG repeat was carried out on all nine kindreds to 
determine if all known SCAl families had the same mutational mechanism 
involving repeat expansion. Most of the study participants were personally 

30 examined. The affected status was always confirmed by a neurologist, but the age 
of onset was based on historical information from the patient and/or other family 
members. Severity of disease was measured by the age at death minus the age of 
onset. Detailed characterization of the repeat variability was carried out for all nine 
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kindreds. To identify additional kindreds with a CAG expansion at the SCAl locus, 
affected individuals from 78 newly identified families with dominantly inherited 
: ataxia were clinically examined. Blood was collected from at least one affected 
individual from each of these kindreds and screened by DNA analysis for the 
V - 5 presence of a CAG repeat size within the expanded range (> 42 repeats). Although 
there was no evidence that these 78 individuals are related, there is a chance that 
some of the affected patients come from the same families. 

To assess the distribution of CAG repeat sizes on normal 
: chromosomes further, the number of CAG repeats was determined for 304 normal 
10 chromosomes from unrelated individuals of various ethnic backgrounds. 

2. Molecular Studies 

Blood samples were used to establish lymphoblastoid cell lines by 
Epstein-Barr virus transformation. Genomic DNA was isolated either directly from 

15 venous blood or from lymphoblastoid cell lines. Blood samples were collected from 
these patients over an 8-year period, during which time 29 patients died. PCR 
reactions were performed using the Repl (TTGACCTTTACACCTGCAT) and 
Rep2 (CAACATGGGCAGTCTGAG) primers. Fifty nanograms of genomic DNA 
was mixed with 5 pmol of each primer in a total volume of 20 |al containing 1.25 

20 mM MgCl2, 250 uM dNTPs, 50 mM KCl, 2% formamide, 10 mM Tris-HCl pH 8.3 
and 1 unit ampHTaq (Perkin-Elmer/Cetus). The Repl primer was labelled at the 5' 
end with [y-^^P]ATP. Samples were denatured at 94*'C for 4 minutes, followed by 
30 cycles of denaturation (94'*C, 1 minute), annealing (55'*C, 1 minute) and 
extension (72°C, 2 minutes). Six |il of each PCR reaction was mixed with 4 jil 

25 formamide loading buffer, denatured at 90^C for 2 minutes, and electrophoresed 
through a 6% polyacrylamide/7.65 M urea DNA sequencing gel. Allele sizes were 
determined by comparing migration relative to an Ml 3 sequencing ladder. 

3. Statistical Analyses 

30 The relationship between age of onset and CAG repeat number on 

both the affected and the normal chromosomes of patients was evaluated through 
linear regression analyses. Similarly, the relationship between repeat length and 
duration of disease was quantified. Ages of onset were used directly in these 
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analyses, but also following logarithmic and square root transformation. Although 
the latter transformation provided the best approximation to a normal distribution, 
results obtained were consistent between analyses before and after transformation. 
Analysis of variance was performed to detect differences among the families in the 
5 mean age of onset, after correction for the effect of the CAG repeat number on age 
of onset. In addition, the sex of the transmitting parent was included as a possible 
explanator>' variable for variations in age of onset. All regression and variance 
analyses were carried out with the SPSS package of computer programs, versions 
4.0.1. 

10 

B, Results 

1 . Family Studies 

All affected individuals from the nine known SCAl kindreds had an 
expanded trinucleotide repeat on one of their alleles. No repeat expansions were 

15 observed among eight kindreds previously shown by linkage analyses not to be 
SCAl. These eight kindreds were examined for the SCAl gene expansion to 
confirm the linkage results. 

Among the 70 other dominant ataxia families analyzed, three (4%) 
were found to have an expanded CAG repeat on one of the SCAl alleles. Of all of 

20 the dominant kindreds studied, 12 of 87 (14%) have an expanded CAG repeat at the 
SCAl locus. While the sample size is relatively small, and both estimates are 
arguably biased to exclude or select for SCAl kindreds, expanded CAG repeat tracts 
within the SCAl gene clearly account for only a small fraction of this complex 
group of diseases. The distribution of the CAG repeat number from normal controls 

25 and from ataxic individuals that did not have an expansion were similar (data not 
shown). These data argue against the involvement of the CAG repeat at the SCAl 
locus in these families. However, it is still possible that some of these small 
families have other mutations at the SCAl locus. 

The typical clinical findings in the genetically proven SCAl kindreds 

30 were gait and limb ataxia, dysarthria, pyramidal tract signs (spasticity, hyperreflexia, 
extensor plantar responses) and variable degrees of occulomotor findings which 
include one or more of the following: nystagmus, slow saccades, and 
opthalmoparesis. In the later stages of the disease course, bulbar findings consistent 
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with dysfunction of cranial nerves IX, X, and XII became evident. Also, dystonic 
posturing and involuntary movements including choreoathetosis became apparent in 
the later stages of the disease. Motor weakness, amyotrophy, and mild sensory 
deficits manifested as propioceptive loss were also detected. Although ataxia, 
5 dysarthria and cranial nerve dysfunction were consistently present in every SCAl 
affected individual, considerable intrafamilial variability was noted with regard to 
all of the other clinical features. Juvenile onset (< 18 years) was observed in four 
kindreds. Of interest is the finding that juvenile onset cases typically inherited the 
disease gene from an affected father. Several of the kindreds that did not have an 
10 expanded SCAl CAG repeat, displayed the same clinical findings as those observed 
in SCAl kindreds confirming the inherent difficulty in clinically classifying this 
group of disorders. While it is possible that some of these kindreds have other 
mutations at the SCAl locus, the disease locus (loci) for eight of these families has 
also been excluded from the SCAl region by linkage analyses. 

15 

2. Repeat Analvsis on Normal and SCAl Chrnmnsnmf><: 

Figure 20 shows the size distribution of the CAG repeats on 304 
chromosomes from unaffected control individuals who are at risk for ataxia, and 113 
expanded alleles from individuals affected with the disease. The normal alleles 

20 range in size from 19 to 36 CAG repeat units. Over 95% of the normal alleles 
contain from 25 to 33 CAG repeat units, the majority (65%) of which contain 28 to 
30 repeats. The mean repeat size on normal chromosomes for the African 
Americans, Caucasian, and South African populations are very similar with 29.1, 
29.8, and 29.4 CAG repeat units, respectively. Combined heterozygostiy for the 

25 CAG repeat at the SCAl locus was 0.809 for the populations examined, giving an 
overall polymorphism information content (P.I.C.) value of 0.787. No change in 
CAG repeat length was observed for 135 meioses of SCAl alleles contaming CAG 
repeat tracts within the normal range, i.e., all were inherited in a Mendelian fashion. 
In contrast, 41 of the 62 meioses involving expanded SCAl alleles changed in 

30 repeat size. The rate of repeat instability for female meioses is 60% while the 
instability observed for males was 82%. 
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The number of CAG repeats found on SCAl chromosomes from 1 13 
affected individuals was always greater than the number of repeats on normal 
chromosomes, ranging from 42 to 81 with a means of 52.6 (Figure 20). 

5 All patents, patent documents, and publications cited herein are 

incorporated by reference. The foregoing detailed description and examples have 
been given for clarity of understanding only. No unnecessary limitations are to be 
understood therefrom. The invention is not limited to the exact details shown and 
described, for variations obvious to one skilled in the art will be included within the 
10 invention defined by the claims. 
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WHAT IS CL AtMgD IS; 

L A nucleic acid molecule containing a CAG repeat region of an isolated 
autosomal dominant spinocerebellar ataxia type 1 (SCAl) gene, said gene 
located within the short arm of chromosome 6. 

2. The nucleic acid molecule of claim 1 corresponding to the entire SCAl gene. 

3. The nucleic acid molecule of claim 1 wherein the SCAl gene encodes 
ataxin-1. 

4. The nucleic acid molecule of claim 3 of about 2.4-1 1 kb in length containing 
the coding region of the SCAl gene. 

5. The nucleic acid molecule of claim 1 wherein the CAG repeat region is 
represented by (CAG)n and n = 2-36. 

6. The nucleic acid molecule of claim 5 wherein n = 19-36. 

7. The nucleic acid molecule of claim 1 wherein the CAG repeat region is 
represented by (CAG)^ and n > 36. 

8. The nucleic acid molecule of claim 7 wherein n > 43. 

9. The nucleic acid molecule of claim 1 wherein the molecule is a single- 
stranded polynucleotide. 

10. The nucleic acid molecule of claim 9 wherein the single stranded 
polynucleotide is cDNA. 

1 1 . The nucleic acid molecule of claim 9 wherein the single stranded 
polynucleotide is mRNA. 
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12. The nucleic acid molecule of claim 1 wherein the nucleic acid is genomic 
DNA. 

13. An isolated oligonucleotide that hybridizes to a nucleic acid molecule 
containing a CAG repeat region of an isolated SCAl gene; said 
oligonucleotide having at least about 1 1 nucleotides. 

14. The isolated oligonucleotide of claim 13 having at least about 16 nucleotides. 

15. The isolated oligonucleotide of claim 14 having no more than about 35 
nucleotides. 

16. The isolated oligonucleotide of claim 13 that produces a primed product of 
about 70-350 base pairs. 

17. The isolated oligonucleotide of claim 16 that produces a primed product of 
about 100-300 base pairs. 

18. The isolated oligonucleotide of claim 13 that hybridizes to the nucleic acid 
molecule within about 150 nucleotides on either side of the CAG repeat 
region. 

19. The isolated oligonucleotide of claim 18 that hybridizes to the nucleic acid 
molecule directly adjacent to the (CAG)n region. 

20. The isolated oligonucleotide of claim 13 having at least about 100 nucleotides. 

2 1 . The isolated oligonucleotide of claim 20 having at least about 200 nucleotides. 

22. The isolated oligonucleotide of claim 13 comprising a nucleotide sequence 
selected from the group consisting of CCGGAGCCCTGCTGAGGT (CAG-a), 
CCAGACGCCGGGACAC (CAG-b), AACTGGAAATGTGGACGTAC 
(Rep- 1 ), C AAC ATGGGC AGTCTG AG (Rep-2), 
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CCACCACTCCATCCCAGC (GCT-435), TGCTGGGCTGGTGGGGGG 
(GCT-214), CTCTCGGCTTTCTTGGTG (Pre-1), and 

GTACGTCCACATTTCCAGTT (Pre-2). 

23. A method for detecting the presence of a DNA molecule containing a CAG 
repeat region of the SCAl gene comprising: 

(a) digesting genomic DNA with a restriction endonuclease to obtain DNA 
fragments; 

(b) probing said DNA fragments under hybridizing conditions with a 
detectably labeled gene probe, which hybridizes to a nucleic acid 
molecule containing a CAG repeat region of an isolated SCAl gene 
having at least about 1 1 nucleotides; 

(c) detecting probe DNA which has hybridized to said DNA fragments; and 

(d) analyzing the DNA fragments for a CAG repeat region characteristic of 
tlie normal or affected forms of the SCAl gene. 

24. The method of claim 23 wherein the step of analyzing comprises analyzing for 
a (CAG)n region wherein n > 36. 

25. The method of claim 24 wherein the step of analyzing comprises analyadng for 
a (CAG)n region wherein n > 43. 

26. The method of claim 23 wherein the detectably labelled DNA sequence 
comprises a portion of an EcoRL fragment of the SCAl gene. 

27. The method claim 26 wherein the EcoRl fragment comprises about 3360 base 
pairs. 
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28. A method for detecting the presence of a DNA molecule located within an 
affected allele of the SCAl gene comprising: 

(a) treating separate complementary strands of a DNA molecule containing 
a CAG repeat region of the SCAl gene with a molar excess of two 
oligonucleotide primers; 

(b) extending the primers to form complementary primer extension 
products which act as templates for synthesizing the desired molecule 
containing the CAG repeat region; 

(c) detecting the molecule so amplified; and 

(d) analyzing the amplified molecule for a CAG repeat region characteristic 
of the SCAl disorder. 

29. The method of claim 28 wherein the step of analyzing comprises analyzing for 
a (CAG)n region wherein n > 36. 

30. The method of claim 29 wherein the step of analyzing comprises analyzing for 
a (CAG)n region wherein n > 43. 

31. A protein encoded by the SCAl gene having therein a glutamine repeat 
region. 

32. The protein of claim 3 1 having a molecular weight of about 20-90 kD. 

-J 

33. The protein of claim 31 having the amino acid sequence shown in Figure 15. 

34. An antibody to a protein encoded by DNA containing a CAG repeat region of 
the SCAl gene. 

35. A method for detecting the SCAl disorder comprising: 

(a) contacting an antibody to a protein encoded by the SCAl gene with a 
biological sample containing antigenic protein to form an antibody- 
antigen complex; 

(b) isolating the antibody-antigen complex; and 
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(c) sequencing the antigen portion of the antibody-antigen complex using 
amino acid sequencing techniques. 
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1 TTTTGAAACT TGCAGAGAAC AGfiXTTATW CTGGCGGCCT CTGCTGACTT eGCCTGTfiTG 
61 TGTGTGTTTG TGTGTGTGTG TATTAGGGAG AGGAAATC6T ACGTCCAGT6 TGGACCCASA 
121 GCTAAGGGSA ATCTTGGAGA GTAGTGGCTC XGGCAGATGA GGATTCAGAA ATCGAGTGCA 
Wl A66ACTGTTC TGCACTTTCA CTGCTAACCT GCTTTTTCTC ACTCCCTGGC TCTGAGGGCA 
241 GGCTCCACCT GGTGXCATGC. TCTCCAACGG CWCATTTTA TGTTCCACCC AGGCAAAGGA 
301 GAGGTGAGAA ATGGAACCAA CATTTCT6AA AAGGAAATTT AA6AACTGCA TCATCTGCCC 
361 TTGAAGAASA AAAGGAGAAA AAAAAACAG6 AGAGAG66TA OTGAGAACAT CTTAGGGGAG 
421 TXGXTAACTC CAWAAAAAA TATATGT6TT ACAGTGTTCA CXTGCCCACT GTCTTCATAA 
481 YCTT CCTYTA lAATCTGCAG CTGCCACGGC TAGT G T III I GTTTTTGTTG ' TTGTTGTTTT 
541 GTTTCGTTTT TGGAGACAGA GT6TCGCTCT GTTGCCCAGG CTGGAGTACA ATGGXGCAXT 
601 CTCGGCTCAC TGCAACCTCT GCCTCCTGGG TTCAAGCAAT TCTCCTGCCT CA6CCTCTCA 
661 A6TAGCTGGG ACTACAGCC6 TGTGCCAGCT AAX6TTACAC CAGGCTAAAT TTGTTTTTTA 
721 TTTTTTATTT TTGGtASASA C65G6TTTCA CCATGTTAGC CAGSATSGTC TTAATCTCCT 
781 GACCTC6TGA TCTSCCT6CC TCGGCCTCCC AAAGTGTTGG CTA6TGWTI CtCTGCTTCA 
841 GTGCTTGG6G TAT6ATTGGG TTATGGGAGT TCACACCGAG TCCA6GGCCT AGTCTTAATC 
901 TTGCCAAAGA XGTTCTTTCC CXGGT6CTCA - TGTTCTGATG TCCTTTCCCX CCTTCCCTTT 
961 CTCCTCCCTT TCCTTTTCCC TTTGTCACTC CCCTCTTCCC TTTCCCAGCA TCCA6AGCTG 
1021 CT6TTGGCGG ATX6TACCCA C6GGGAGATG ATTCCTCATG -AAGACCCXCG ATCCCCtACA 
1081 GAAATCAAAT GTGACTTTCC GXTTAffCAGA CTAAAATCAG ASCCATCCAG AACAGT6AAA 
1141 CAGXCACCGT GGAGGG6GGA CGGCGAAAAA TGAAATCCAA CCAA6AGCGG AGCAACGAAT 
i201 GCCTGCCTCC CAAGAAGCGC GAGATCCCC6 CCACCAGCCG GTCCTCGGAG CSAGAAGGCCC 
1261 CTACCCTGAC CCAGCGACAA CCACC6GGTG ' GA6GGCACAG C31TTGGCTCC CSGGCAACCC 
1321 TGGTGGCCGG GGCCACGGGG GCGGGAGGCA TGGGCCGGCA GGSACCTCGG TG6A6CTTGG 
1381 TTTACAACAG GGAATAGGTT TACACAAAGC ATTGTCCACA 6GGCTGGACT ACTCCCCGCC 
1441 C&GC6CTCCC AGGTCTGTCC CCGTGGCCAC CACGCTGCCT GCCGCG!CACG CCACCCCGCA 
1501 GCCA6G6ACC CCGGXGTCCC CC6TGCA6TA CGCTCACCTG CCGCACACCT TCCAGTTCAT 
1561 TGSGTCCTCC CAAIACAGT6 6AACCTATGC CAGCTTCATC CCATCACAGC TGATCCCCCC 
1621 AACCGCCAAC CCCGTCACCA 6T6CAGTGGC CTCGGCGCAG GG6CCACCAC TCCATCCCAG 
1681 CGCTCCCAGC TCGACGCCTA T7CCACTCTG CTGGCCAACA TGGSCAGTCT GAGCCA6ACG 
1741 CCGGGACACA AGGCTGAGCA GCAGCAGCAG CACCACCAGC AGCAGCAGCA GCACCATCAG 
1801 CATCAGCAGC AGCAGCAGCA GCAGCAGCAG CAGCAGCAGC AGCAGCAGCA CCTCAGCAGG 
1861 GCTCCGGGGC TCAXCACCCC GGGTCCCCCC CAACCAGCCC- AGCAGAACCA GtTACGTCCAC 
1921 ATTTCCAGTT CTCCGCAGAA CACCGGCCGC ACCGCCTCXC CTCCGGCCAT CCCC6TCCAC 
1981 CTCCACCCCC ACCAGACGAT GATCCCACAC AC6CXCACCC TG6GGCCCCC CTCCCAGGTC 
2041 GTCATGCAAT ACGCCSACTC CGGCAGCCAC TTTGTCCCTC 6GGAGGCCAC CAAGAAAGCC 
2101 GAGAGCAGCC GGCTGCAGCA 6GCCATCCAG GCCAAGGACG TCCTGAACGG T6AGATGGAG 
2161 AAGAGCCGGC 6GTACGGGGC CCCGTCCTCA GCC6ACCTGG GCCTGG6CAA GGCA66CGGC 
2221 AAGTCGGTTC CTCACCC6TA CGAGTCCAGG CACGTGGXGG TCCACCCGAG CCCCTCAGAC 
2281 TACA6CAGTC GTCAXCCTTC GGGGGTCCGG 6CCTCTGTGA TG6TCCTGCC CAACAGCAAC 
2341 ACGCCCGCAfi CTGACCXGGA GGT5CAACAG GCCACTCATC GTGAA6CCTC CCCTTCTACC 
2401 CTCAACGACA AAA6TGGCCT GCA5PTTA6GG AAGCCTCGCC ACCGGTCCTA CGCGCTCTCA 
2461 CCCCACACGG TCATTCAGAC CACAO^CAGT GCTTCAGAGC CACXCCCG6T GGACTGCCAG 
2521 CCACGGCCTT CTACGCAGGG ACTCAACCCC CTGTCATCGG CXACCTGAGC GGCCAGCAGC 
2581 AA6CAATCAC CTACGCCGGC A6CCTGCCCC AGCACCTGGT GATCCCCG6C ACACAGCCCC 
2641 TGCXCATCCC 6GTCGGCAGC ACTGACATGG' AAGCGTCGGG GGCAGCCCCG GCCATAGTCA 
2701 C6TCATCCCC CCAGTTTGCT GCAGTGCCTC ACAC6TTC6T CACCACCGCC CTTCCCAAGA 
2761 GCGAGAACTT CAACCCTGAG GCCCTG6TCA CCCAGGCCGC CTACCCAGCC ATGGT6CAG6 
2821 CCCAGATCCA CCXGCCTGTG GTGCAGTCCG TGGCCTCCCC GGCGGCGGCT CCCCCTACGC 
2881 TGCCTCCCTA CTTCAXGAAA GGCTCCATCA TCCAGTTGGC CAACGGGGAG CMAAGAAGG 
2941 TGGAAGACTT AAAACAGAA6 ATTTCATCCA GA6TGCAGAG ATAAGCAACG ACCTGAAGAT 
3001 CCACTCCAGC ACCGTAGAGA GGATTGAAGA CAGCCATAGC CCGGGCGTGG CCGTGATACA 
3061 GTTCGCCGTC G6G6AGCACC GAGCCCAGGT AACGTTAGCC AGGGTGGCAC AGGGATGGGA 
3121 CACCATACCG TGATGCCATC ATCATCTCCT GGCAAGACCA ATTGCTTCTA TGAGGCAGGA 
3181 TTAAGGGTTC TCGGGTACAC CTAGACCTTA GACTCGGCCT 7TCCCAACTG CGTXCTCTAG 
3241 AAAAAATAAG CCCCATTTCC CCGTGATCTC TGCT6TGTGT AAXGAATTAA CCTCCATGCA 
3301 TGGAGAGTGG GGCTAGTTAT GGAGTCCTT6 ACACAATCCA GAAACTCACC ACTCTCG^A 
3361 TTTTXT 
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Patient *1 (CAG) nCACCTCAGCASGGCTCCSGGGCTCATC; n»56. 

CA6CA6CX6CJ^CA6CAGCA6CX6CA6CA6CACCA6CA6CAGCAGCA6CAG(^GCX6C^ 

CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCiUSCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCXG 

CAGCAGCA6CAGCA6CAGCAGCA6CAGCAGCAGCAGCACCTCAGCAGGGCTCCG6GGCTCATC 



Patient «2 (CAO&CACCTCAGCAGGGCTCCGGGGCTCATC; n-69. 

CAGCAGCAGCAGCAGCASCAGCAGCAGCASCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGC^^ 
CAGCAGCAGCAGCAGCAiGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG 
OUSCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAlSCAGCAGCAGCAGCAGblGCAG^ 
CA6CAGCAGCACCTCAGCAGGGCTCCGGG6CTCA7C 



Patient #3 • (CAG)nCACCTCA6CA6GGCTCCGGGGCTCXTC/ n-47. 

CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGO^GCAGCAGCJUSCAGCAGCAGCAGC^ 

CAGCAGCAGCAGCAGCAGCAGCAGCAGCASCAGCAGCAGCAGCAGCAGCAGCAGCAG^^ 

CAGCAGCAGCACCTCA6CAGGGCTCCGG6GC7CA7C 



Patient *4 (CA6)nCACC7CA6CAGGGCTCCGGGGCTC&7C; n-48. 

CAGCAGCAGCAGCAGCAGCAGCAGCAGCJUSCAGCAGCAGCAGCAGaiGCAGCAGCA^ 

CASCA6CAGCAGCA6CA6CA6CA6CA6CAGCAGCAGCAGCAGCA6CA6CA6CAGCA6CA6CAGCA6 

CA6CA6CA6CA6CACCTCAGCAG6GCTCCGGGGC7CATC 



Patient #5 TGAG(CAG)n; n«50. 

•TGAGCAGOkGCAGCAGCAGCAGCAGCAGCAGa^CAGCAGCXGCAGCAGCAGCAGCAGC^^ 

GCAGCAGCAGCAGCAGCAGO^GCAGCAGCAGCAGCXGCAGCAGCAGCAGCJ^CAGCAGCAGCAGC^ 

GCAGCAGCAGCAGCAGCA6CAG 



FIGURE 2 
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1 GlLTCCCCtxL^ ACCGCCftACC CCGTCIiCCJia Tu<i&GTGGCC TCCGCCC3U3G 

51 GGCCS^CACT CCATCCCAGC CCTCCCAGCT GGAGGCCTJLT TCCACTCTGC 

Rft&-2 V ' — ^ 
101 TGGCCAACAT GGGCAjSTCTG AjSCCAGACCC CGGGACACAX 6GCTCA.CCXt; 

151 aiGCAGCAGC AGOIGCAGCA 6CAGCAGCA6 CAACATCX6C ATCAGCAGCX 
201 GCJL6CAGC3LS CAjSCJIGCXGC AGCAGCAGCX CCACCJIGCTlC CTCASCXGGG 

— — ^ gg.r.oi4 ^ 

251 CTCCGCGGCT CJITCACCCCG GGTCCCCCCC ACCM5CCCAG .CACAACCACT 

Pr»«2. 

aOl ACGTCCACm TTCCAGTa?CT CCGCAGAXCA CCGSCCGCAC CGCCTCTCCT 

351 CCGCCCATCC CCGTCCACCT CCACCCCOU: CAGACGATGA TCCCACACJiC 

««01 GCTCACCCTG GGGCCCCCCT CCCAGGTCGT CATGCaATAC GCCGa.CTCCG 

451 CCAI2CCACTS TGTCCCTC6G GAGGCC&CCA ACJAAGCCGA GAGCAGCCCC 

501 CTGCAG. 
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FIGDRB 15 



1 
91 
Ifil 
271 
3€1 
451 
541 
631 
721 
811 
901 
. 1 
991 
20 
1081 
50 
1171 
80 
1261 
110 
1351 
140 
1441 
170 
1531 
200 
1621 
230 
1711 
260 
1801 
290 
1891 
320 
1981 
350 
2071 
380 
2161 
410 
2251 
440 
2341 
470 
2431 
500 
2521 
530 
2611 
560 
2701 
590 
2791 
620 
2881 
650 
2971 
680 
3061 
710 
3151 
74 0 
3241 
770 
3331 
800 
3421 
3511 



CTACTACaGTGGaSGACGTACAGGACCTGTTTCACTGO^GGG^ 
CRAGACATTGTTTCTCTCCCTCTGCCCCCCCTTCCCCa.CGCAA 

CftAfirarn^ fln CTCCCrGATGGAAAGGAGCATCGTGCATCAAgTCRCCAGGGTGGTCCATTCJUVGC^ rmTlT GTaiTCCTTGT 

ACAGCAATCTCCTCCTCCACTGCCACrACAGGGAAGTGCATCaaVTGTC^ 

AAACTT-MTCCTGCTGCAGACaVGGAAaVAGAGAGAA^ 

TAGGCGTTTTAOICTGAGATTCTCCACTGCCACCCTTTCTACTCAAGC^^ 

ATGGTTCTCOlTTCTGATGAAAGCACATGCyrACMTTTTCCAAAGAAATTAGA^ 

TCATAGGGTArrTCTCACTTCTCTGTGAAAGGAAGAAAGAACAraCCTGAGCCa^^ 

TCTCCATCGTGAAGTATAGGCT^GGCPACCTGTGAACaGTACG^ 

CKSMATGATTCCTGATGAAGAGCCTGGATCCCCTAOUSAAATCAAATGT^ 

GTGAAACAGTOICCGTGGAGGGGGGACGGCGAAAAATGAAATCCAACCAAGAGCG^ 

MKSMQERSNBCLPPKKREI 
TCCCCGC(3lCCAGCCGGTCCTCCGAGGAGAAGGCCCCTACCCTGCCCAGCGACAAC^ 

PATSRSSE E KAPTL PSD NKRVEGTAWLPON 
ACCCTGGTGGCCGGGGCCACGGGGGCGGGAGGCATGGGCCGGCAGGGACCTCX^^ 

PGGRGHGGGRHGPAGT SV ELGLQQGIGLHK 
AAGOlTTGTCCACSkGGGCTGGACTACTCCCasaxaWSCGCrCC 

ALSTGLDYSPPSAPRSVPVATTLP AAVATP 
CGCAGCCAGGGACCCCGGTGTCCCCCGTGCAGTACGCTCACCTGCCGaVCaiCCTTCCAGTTC^ 

QPGTPVSPVQYAHLPHTF QPIGSSQYS GTY 
ATGCOIGCTTCATCCCATCACAGCTGATCCCCCCAACCGCCAACCCCGTC^ 

AS F I PSQL I P PTANP VTSAVASA AGATTPS 



QR SQLEAySTLLAN MGSI*SQTPGHKAEQQQ 
AGCAGCAGCMCAGCAGCAGCAGCAGaU5CATCAGCATCAGCJ«5C^ 

QQQQQQQQQH QHQQ QQQQQQQQQQ QQQ Hi^g 
GCAGGGCTCCGGGGCTOlTCACCCCGGGGTCCCCCCCSlCCAGCCCAGaw^ 

RAPGLITPGS PPPAQQNQYVHISSSPQNTG 

CCCCTCCC 
LTIfGPPSQ 



TASPPAI PVHLHPHQTMI PHT, 
AGGTCGTCATGCAATACGCCGACTCCGGCAGCCACrrTGTCCCTCGGGAGGCCAC 

VVMQYAOSGSHPVPREATKKAB SSR LQQAI 
TCOVGGCaVAGGAGGTCCTGAACGGTGAGATGGAGAAGAGCCGGCGGTACG^ 

Q A K E V L NGEMEKSRRYGAP S S A D L G L G K A Q 

GKSVPHPYE SRHV VVH PSPSD ^^^^'^'^'^^^^^^GGGGG 
TCCGGGCCTCTGTGATGGTCCTGCCCAAOUSCAAaVCGCCCGCAGCTGACC^^ 

r^lrJL ^^"^^PNSNTPA A DLEV QQ ATHRBASPS 
CTACCCrCAACGAOUUlAGTGGCCTGCATTTAGGGAAGa^^ 

TLNDKSGLHLGKPGHRSYALSPHTVIQTTH 
ACAGTGCTTCAGAGCCACTCCCGGTGGGACTGCCAGCCaaSGCCT^ 

''^'•"PLPVGL PATAP Y A G T Q P P V I G 



A S B 



QHL V I PGTOPLI.IPVGSTD 
ACATGGAAGCGTCX3GGGGCAGCCCCGGCCATAGTCACGTC31TCCCCCCAGTTO 

MEASGAAPAIVTSSP Q FA A VPHT PVTTALP 
CauUSAGCGAGAACTTCAACCCTGAOGCCCTGGTCACCCAGGC^ 

KSENPMPEAI.VTQ AAYPAMVQAQIHLPVVQ 
AGTCCGTGGCCTCCCCGGCGQCGGCTCCCCCTAaSCTGCCTCCCKlCTTCAT^^ 

SVASPAAAPPTLPPYPMKGSIIQLAHOELK 
AGAAGGTGGAAGACTTAAAAACAGAAGATTTCATCCAGAGTGCAGAGATAAGaWl^ 

KVEDLKTEDPIQSAEISNDI.KIDSSTVERI 
TTGAAGACAGCCATAGCCCGGGCGTGGCCGTGATACAGTTCGCCGTCGGGGAG^^ 

^ P S » S P GVAV I QP AVG E H RAQV S VBV LVE Y 
ATCCl Lllx 1 iGToA x-AtaGACAGGGCTGGTOlTCCTQCTGTCCGGAGAGAACCA^ 

PPFVFGQGWSSCCPB RTSQLFDLPCS KLSV 
TTGGGGATGTCTGCATCTCGCTTACCCTaUUSAACCTGAAGAAC^ 

GDV C ISI.TLKNLKN GSVK KGQPVDPASVL1* 
TGAAGCACTCAAAGGCCXSACGGCCTGGCXXMCaGCAGACACAGGTATGCCGAGCAGGAAAA^ 

KHSKADGLAGSRHRYAEQENGINQGSAQML 
TCTCTGAGAATGGCGAACTGAAGTTTCCAGAGAAAATGGGATTGCCTGCMCGCCOT 

SENGELKFPEKMGLPAAPFLTKIEPSKPAA 
CyUlCGAGGAAGAGGAGGTGGTCGGCGCCAGAGAGCCGCAAACTGGAGAAGTCAGAAGACGAACCAC^^ 

TRKRRWSAPESRKI.EKSEDEPPLT LPKPSL 
TAATTCCTCAGGAGGTTAAGATTTGCATTGAAGGCCGGTCTAATGTAGGCMGTAGAGGC^ 

IPQEVKICIEGRSNVGK* 
TTATCATi'i^iATCCAGATTACTGTACTGTAGGCrAAAATAACACAGT^ 
TTGTaiTTAGAGTTACAGCAGGTGTGTCGa«K5AGACltMTGCATATGC^^ 
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Figure 15 {continued) 
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GCACAGCAGGAGCGGTCAG6GCrCCJW3GCATCCCCGGGGAAGAAAGGAACGGGGC^ 3690 

AGCCGGGGGCGCTGACrCCCGCTAGTGTCAGGAGAAAAGTCCCGTGGGAAGJ^ 3780 

CACAGGCGCTGTGGCGGCGAGTGAGGGTCTCrrTTTCTCTGCCTCCCTCTGC 3870 

GAGCAGTGTCCTCCTGGGGTTCrCACGTGCJUUUlTCAACATCaGGAACCCAGC^ 3960 

GGAAAGTTAACaiTTTAAAAGAACATTTTTCTCTCCJVACATAT^^ 4050 
ATGGGGCCTGACTGCACTGATATAT ArXTmTrA AAGAGCAACTGCC^ * * aACTAGTGCAGCGATG 4140 

TCACCAGGGTGTTGTGGTGGAOiGGGAAGCCCCTGCrGTCATGGCCCCACATGGGGTAW 4230 

CGAACACCCACGCTGGTTTCTGTGCAGTGTTTkGGAAAACCAATCAGGTT^^ 4320 

CTTCAGTGAGAGCAACAGAAGCTCrTCACglTGAtOT 4410 

GACAACXCGGGTGCCA CrrriTriTi i i, I CAGATTCCAGTGTGACATGAGGAATTAGATrTTGAAaATGAG^ 4500 

GOlTTTAAAAATACrGTTCAaiCTTTATTACCAAGCATCITOGtCTCT 4590 

AAAACAAAAACAAAAAAAACTAAGTTGCTTTCTTTTTTTCAACAt^ 4680 

TGAAAGTTTCAATGTGGTTTAAAGGGATGAATGTGAATTATGAACTAG^^ 4 770 

CACTTTTavCTTTGATGTCTGAGAATCAGTTaUUSGCAT^ 4860 

OlTTTrrGTCCMTGTTTTTCTTTTTAAGATGAACTTT^ 4950 

TGCAGTTTTTATCCAATAAa^TTGTGGGAAAGGTTTGGGGGACTGAACGAGCATAAATAAATC^ 5040 

AACTCTAGGCCATTTTATAAGGTTATGTTCCOTTGAAAATTCATTTTGCT 5130 

GcrcrrAGAAACTcitaGA AiTrrcri 'caGATTcaT^^ s 22 o 

AATTACTTTATTATTGTTGTTATTAATGTTATTTTCAGAATGGL^^^^ i XCTATTCAAAATCAA ATCGAGA TTTAAT GTTTG GTACA 5310 

AACCO^SAAAGGGTATTTCATAQTTTTTAAACCrrTTCATTCCC^^ S 4 0 0 

CTTTAAAAAAAAGTTTTATAAGTAGGGAGAAATTTTTAAATATTCnTAC^^ 5490 

TTTTACCCeATTGAAAATAGTA L - riTL ' riVXi ' i ' TiC ACaAATTAAA 5580 

TTACawrTTAGGGTTCaCCauySACTAATGATTTTTATAAACLX^^^^ 5670 * 

ATAGTTCCTTGACTTTCCTCGAATTTCATTACCCTCTCAGCATGCT^ 5760 

TTAGTGCTGTA rrri ' T T A AACGTTTCTGTTCAGAGAA CnXjCrfA ATC 5850 

G' iU ' lVI ' U ' in ' rriX;rin ' iiTA GCCTTTGATGGTAAGAGGAATAC^ 5940 

CATGTGGACTCAGAAAAACACACACCACCnnrrTCGCTTAC^ € 03 0 
ACACaCATGGTTTQGAGCAATAGGAACATCATCATAA rm ' lU I GG T I C TATTTCAGGTATAGGAATTATAAAATAATTGGTTCITTCTA 6120 

AACACTTGTCCCATTTCATTCT CUUXjC ' lTri 1 1 ' A GCATGTGCAATACn^^ 6210 

GCTCATTCCCTTTTGGCTTlTTCCTTGTrrGGTTGATC^^ 6300 

GCCTCCCACCTTTCCCCTGCTGCGGATGCPGACnXSCTCGGGCGG^ €3 90 

GCCAGCCMGGAGACCCGGGGGAGGAACCGCaGTGTCCCCTGTCACCA 6480 
TTCATTTCTAAGAOjCaVCTCTGGAGCCATGTAGCCTGGACTCAACCC^^ j. J. iCTGCAAG TGGGCAGGCCCCTCCTCGG 6570 

GGTCTGTGTCCITGAGACITGGAGCCCTGCCTCTGAGCCTCGACX^ 6660 

GTGGCTGTGGAGGGGACCACCTGCCACCCACGCrrCACCACTCCXTTt^^ 6750 

AGCCrCCTGTTTGa^CGTTGQCGGGCCCCXSAGGCTCCCJl^^ 6840 

GGTAGAAATTClTCGGTGCCOTTCAGCrrTAaUU^GATCAGCC^ 6930 

rn " IUri ' LnU ' i ' Cl '' l '' i C CU ' Gll ' T ' i 'CC3lTTTTTAAACT 7020 

7110 



GTTCCTTAGAATGTTTAACTTAAGAATTATTTCAGTTTGTCTGGGCCACAC^^ 7200 

CACTTACCTCaUSATCrrrTAAAGTGGAAATCCAAATTOAATTT^ 7290 

TTCCCITIACTCACCCAGTTTAGTTTGGGATGATTTGATTTCT^^ 7380 

TCTGTTAGGTGAGTGTGTTGGGTTTTTTCCCCCCACCAGGAAGTGGCA^ 7470 

ACRC Ci ' Cr ' l"I ' L " l ' CAGGGACGGGGC3^S GTGl\»iXil\jl ' G GTACACTGA Ltjlxa TCCAt^^ 7560 

CAATTTCAAGGAA lU ' lUUti G Ain " rCC T:GCAT CU ' i\»iU 3^^ 7650 

GtfACaGTAGCTCCrAGTAATCATAAAATCXaWZTCTTTGCACS^ 7740 

AGCTCrGGA riTmrrrAUT ' l ITUri ' ll X Xii AGGAAACGATTGAaATACCCTTTAACATClGTGACTACTAAGG^ 7830 

ATAGAGAGAAAAATCTCa^TGCTTTTGAAGAOOTAATACCGTCCTACT 7920 

CCGGG CriTCn ' Xm GCT G UmTG G 'X'X' Q TCATGGCTACTGrTTCATGA 8010 

TTTGCACTTCAATTTGCACCAGGTGAAAACAGGGCCAGCAGACrCCATGGCra 8100 
TTACA tJ ' X ' X ' XVXn ' XUTX ' XU ' X ' A AGTGGCGTGGAGGCCTTTGCT^ X i X XA ACCCftGAATTTCTGAAATAGAGAArrrAAGAAC 8190 

ACATCAAGTAATAAATATACAGAGAATATACXU"Xn"lTATAAACCACATGCATC^ 8280 

GACAGTGTTGTGTTTCTGGCATAGGGAAACTCCAAACAACTTGCACAC 8 3 70 

CTTCAAATACGTTACCTTACTGATGATAGGAT Cnn"XU ' Cl IGTAGCACTATACCTTGT^ 8460 

GAAGCTGAAGAAAACakAAATTTTGAAGCACTaVCTTTGAGGAGTACA^ 8550 

AATGATTCATTCAGTGTTTGAAAGATATGGCTCTGTTGAAACAATGAGrrr 8640 

AAAGTTACATGTTTTTTTGTATATAGAAATTTGTCATGTCTAAATGATaWS^ 8730 
GGCTCTTAAACTATACCTATGCrrTATTGTT A ' XTT ' X ' X G ' X lA CATATAGCCCTCGTCTGAGGGAGGGGAACTCGGTA TTCTGCG ATTTGAGA 8820 

ATACTGTTa^TTCerATGGTGAAAGTAL'XTCT CT GAGCrCC LTX ' CXTA GTCTfl^ 8910 

TGATGTTTGACATTTTCAGCACTTCCTGTTCCTATAAACCCAAAGAATATAATCTTG;^ 9000 
ACCAATCAAAOlGGACTCATTATGGGGACAAAAAAAAAAAAAATTATTTaiCCIX'C X TCCCCCCACACCTCATTTAAATGGGGGGAGTA 9090 

AAAACM-GATTTCAATCTAAATGCCTCATTTTATTTTAGrTTTATtT^ 9180 

TCTTCrCAGAATAGTATTCCTGTCCAAAAATCAAGCCGGAa^GTGGAAACTGGACAGCTO 9270 

CTTAAATTCAGAATCTCGTCCCCTCCCTTCTCGTTGAAGGCAACTGTTCTGGTAGCT 9360 

CGGCTTGAGTTTTTOVTGTCCCCATGACTTGCATACAAATGGTTCAACTGTAT^ 9450 
ACAATAACAACAATCTCTAAGAATTTCCATAA C ' X " XT1 ' C ' X T A TCTGAAAGGACTCAAGTCtTCCACTGCAGATACATTGGAGGCTTCACCC 9540 

AC GTrX ' TCr T X CCCTTTA GT ' X XL>X X i GCTGTCTGGATGGCCAATGAGCCrGTCTCCTTTTCTGTGGCCAATCTGAAGGCC^ 9630 
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Figure 15 (continued) 



9631 GT GrXXjlT CACAGTAATCCTTACaUVGATAACATACTGTCCTCCAGAATACCAACT 9720 

9721 AGCAGTTACCaAGJU^GCTCGGTGCACAGGTTTTCrrCTGGTTC^ 9810 

9811 AATCCTTTAGTTAGTGCATTTGAACTTGGTACCTGTGCATTCAGTTCTGTGAATACTGC^ 9900 

9901 CCTGAACTGCTCAACTCTAAACCCAAATTAGTGTCAGCCGAAAGGAGGTTTCAAGAT^ 9990 

9991 AGACAGTCTTCAITTCCAGCCAGTGGAGTCCTXMCTCCAGAGCCATCTCTGAGACTCCGTAC^^ 10080 

10081 CCCACCATATGCCTCCCAaiGGCaUWKKSAAAACyVGACACplGJ^ 10X70 

10171 GAACTAGGGAAGGAATGATGTTTTGCACCTTATTGAAAAGAAAATTTTAAGTGC^ 10260 

10261 AA CrrXTlT CaiTATGCGTGCATACTCrCTGTAATTCCAGTGTAAAATO 103 SO 

103 SI GAAGAATT^TATTCT ATXUn 'CTAATCGTGGTGTGTCT A lTr rG T A GGAT^ 10440 

10441 GATGGTGCTTGCAGGTTTTCTAGGTAGAAATTATTTCATTATTATAATAAAACa^^ 10530 

10531 TAAATTGTCTGTATACCAGTAaUkGTTTA ri^in ' l CAGTATACTCGTACTAATAA 10620 

10621 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 10660 
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gtttctatgcatag 



ctcgaccattgcag 



ttgtttgactgcag 



ttttataattacag 



tttttctattccag 



tatttccatgctag 



cttccctttcccag 



ccctgtttccacag 



158 

GTTTTACC 
207 

GAGCATCG 
322 

CATACTGG 
448 

GTCTAGGC 
576 

TTTTCCAA 

638 

GTATTTCT 
776 

CATCCAGA 
2857 

GTCAGCGT 



Exon 1 
Exon 2 
Exon 3 
Exon 4 
Exon 5 
Exon 6 
Exon 7 
Exon 8 
Exon 9 



157 
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PROTEIN WHICH INTERACTS WITH THE HUNTINGTON'S DISEASE GENE 
PRODUCT, cDNA CODING THEREFOR, AND ANTIBODIES THERETO 

BACKGROUND OF THE INVENTION 

This application relates to a protein designated as HlPl which interacts with 
the Huntington's Disease gene product, cDNA coding for HIPl, and methods and 
compositions relating thereto. 

"Interacting proteins" are proteins which associate in vivo to form specific 
stable complexes. Non-covalent bonds, including hydrogen bonds, hydrophobic interactions 
and other molecular associations form between the proteins when two protein surfaces are 
matched or have affmity for each other This affmity or match is required for the recognition 
of the two proteins, and the formation of a stable interaction. Protein-protein interactions are 
involved in the assembly of enzyme subunits, in antigen-antibody reactions, in forming the 
supramolecular structures of ribosomes, filaments, and viruses; in transport; and in the 
interaction of receptors on a cell with growth factors and hormones. 

Huntington's disease is an aduit onset disorder characterized by selective 
neuronal loss in discrete regions of the brain and spinal chord that lead to progressive 
movement disorder, personality change and intellectual decline From onset, which generally 
occurs around age 40, the disease progresses with worsening symptoms, ending in death 
approximately 18 years after onset. 

The biochemical cause of Huntington's disease has thus far not been 
determined. Various theories have been advanced, but each has failed to stand up to 
experimental evidence designed to test its validity. For example, it was suggested that the 
selective neuronal loss could be attributed to restricted expression of mRN A or proteins in 
cells undergoing degeneration. No obviously altered levels of mRNA transcript or protein 
expression has ever been observed in HD-afFected tissues, however. 

While the biochemical cause of Huntington's disease has remained elusive, a 
mutation in a gene within chromosome 4pl6.3 subband has been identified and linked to the 
disease. This gene, referred to as the Huntington's Disease or HD gene, contains three repeat 
regions, a CAG repeat region and two CCG repeat regions. Testing of Huntington's disease 
patients has shown that the CAG region is highly polymorphic, and that the number of CAG 
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repeat units in the CAG repeat region is a very reliable diagnostic indicator of having inherited 
the gene for Huntington's disease. Thus, in control individuals and in individuals suffering 
from neuropsychiatric disorders other than Huntington's disease, the number of CAG repeats 
is between 9 and 35. while in individuals suffering from Huntington's disease the number of 
CAG repeats is expanded and is 36 or greater 

The protein product encoded by the HD gene has been localized to the 
cytoplasm, including to the membranes of vesicles on the brain of both normal and HD- 
affecied individuals. To date, no differences have been observed at either the total RNA, 
mRNA or protein levels between normal and HD-affected individuals. Thus, the function 
of the HD protein and its role in the pathogenesis of Huntington's Disease remain to be 
elucidated. 

SUMMARY OF THE INVFNTinN 

We have now identified a protein, designated as HIPl. that interacts 
differently with the gene product of a normal (16 CAG repeat) and an expanded (>44 
CAG repeat) HD gene. The HlPl protein originally isolated from the yeast two-hybrid 
screen is encoded by a 1.2 kb cDNA, devoid of stop codons. that is expressed as a 400 
amino acid polypeptide. By further screening of a human frontal cortex cDNA library, 
and employing the protocol for 5' Rapid Amplification of cDNA ends (RACE), a total of 
4795 nucleotides (with an open reading frame of 914 amino acids) of the 10 kb message 
HIPl have been isolated to date. Expression of the HIPl protein was found to be limited 
to the brain, where the interaction of the HIPl with the HD protein appears to be necessary 
for the association of the HD protein with the membrane or specific cytoskeletal 
components to render it functional. Because HIPl interacts with expanded HD protein less 
well than with normal length HD. introduction of additional HIPl or overexpression of 
HIP-1 can lead to increased functionality of the defective or normal HD protein. 
Alternatively, modified forms of the HIPl which bind more effectively to expanded HD 
could be introduced to convert the expanded HD protein into a functional molecule. 
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BRIEF DESCRIPTION OF THE DRAWING 

Fig. 1 graphically depicts the amount of interaction between HIPl and 
Huntingtin proteins with varying lengths of polyglutamine repeat. 

DETAILED DESCRIPTION OF THF INVENTION 

The HIPl protein which interacts with the HD gene product was identified 
using the yeast two-hybrid system described in US Patent No. 5,283,173 which is 
incorporated herein by reference. Briefly, thiN system utilizes two chimeric genes or plasmids 
expressible in a yeast host. The yeast host is selected to contain a detectable marker gene 
having a binding site for the DNA binding domain of a transcriptional activator. The first 
chimeric gene or plasmid encodes a DNA-binding domain which recognizes the binding site of 
the selectable marker gene and a test protein or protein fragment. The second chimeric gene 
or plasmid encodes for a second test protein and a transcriptional activation domain. The two 
chimeric genes or plasmids are introduced into the host cell and expressed, and the cells are 
cultivated Expression of the detectable marker gene only occurs when the gene product of 
the first chimeric gene or plasmid binds to the DNA binding domain of the detectable marker 
gene, and a transcriptional activation domain is brought into sufficient proximity to the DNA- 
binding domain, an occurrence which is facilitated by protein-protein interactions between the 
first and second test proteins. By selecting for cells expressing the delectable marker gene, 
those cells which contain chimeric genes or plasmids for interacting proteins can be identified, 
and the gene can be recovered and identified. 

In testing for Huntington Interacting Proteins, several different plasmids were 
prepared containing portions of the HD gene. The first four, identified as 1 6pGBT9, 
44pGBT9, 80pGBT9 and 128pGBT9, were GAL4 DNA binding domain-HD in-frame 
fusions containing nucleotides 3 14 to 1955 (amino acids 1-540) of the published HD cDNA 
sequences cloned into the vector pGBT9 (Clontech). These plasmids contain a CAG repeat 
region of 16, 44, 80 and 1 28 glutamine-encoding repeats, respectively. A clone (DMK 
BamHIpGBT9) was made by fusing acDNA encoding the first 544 amino acids of the 
myotonic dystrophy gene (a gift firom R. Komeluk) in-frame with the GAL4-DN A BD of 
pGBT9 and was used as a negative control. 
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These plasmids have been used to identify and characterize HIPl, two 
additional HD-interacting proteins, HIP2 and HIP3 proteins, and can be ftirther used for the 
identification of additional interacting proteins, and for tests to refine the region on the protein 
in which the interaction occurs. Thus, a first aspect of the invention is these four plasmids, 
and the use of this plasmids in identifying HD-interacting proteins. Furthermore, it will be 
appreciated that the GAL4 DNA-binding and activating domains are not the only domains 
which can be used in the yeast two-hybrid assay. Thus, in a broader sense, the invention 
encompasses any chimeric genes or plasmids containing nucleotides 3 14 to 1955 of the HD 
gene together with an activating or DNA-binding domain suitable for use in the yeast one, 
two- or three-hybrid assay for proteins critical in either binding to the HD protein or 
responsible for regulated expression of the HD gene. 

After introducing the plasmids into Y190 yeast host cells, tniiisforming the host 
cells with an adult human brain Matchmaker^^^ (Clontech) cDNA library coupled with a GAL4 
activating domain, and selecting for the expression of two detectable marker genes to identify 
clones containing genes for interacting proteins, the activating domain plasmids were 
recovered and analyzed. As a result of this analysis, three different cDNA fragments were 
identified as encoding for HD-interacting proteins and designated as HIPl, HJP2 and HFPB 
The sequences of HIPl and HIP3 are given in Seq. ID Nos 1 and 3 The polypeptides which 
each encodes are given by Seq. ID Nos. 2 and 4. Further investigation of the HIPl cDNA 
resulted in the characterization of an additional region of cDNA totaling 4795 bases and a 
corresponding protein, the sequences of which are given by Seq ID Nos. 5 and 6. respectively 

The cDNA molecules, particularly those encoding portions of HIPl, can be 
explored using oligonucleotide probes for example for amplification and sequencing. In 
addition, oligonucleotide probes complementary to the cDNA can be used as diagnostic 
probes to localize and quantify the presence of HIPl DNA Probes of this type with a one or 
two base mismatch can also be used in site-directed mutagenesis to introduce variations into 
the HIPl sequence which may increase. Thus, a further aspect of the present invention is an 
oligonucleotide probe, preferably having a length of from 15-40 bases which specifically and 
selectively hybridizes with the cDNA given by Seq ID No. I or 5 or a sequence complemen- 
tary thereto As used herein, the phrase "specifically and selectively hybridizes with" the 
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cDNA refers to primers which will hybridize with the cDN A under stringent hybridization 
conditions 

DNA sequencing of the HlPl cDNA initially isolated from the yeast two-hybrid 
screen revealed a 1 .2 kb cDNA that shows no significant degree of nucleic acid identity with 
any stretch of DNA using the blastn program at ncbi (blast@ncbi.nl m.nihgov). When the 
entire HIP I cDNA sequence (SEQ ID NO 5) is translated into a polypeptide, the entire HIPl 
cDNA coding (nucleotides 328-3069) is observed to be devoid of stop codons, and to produce 
a 914 amino acid polypeptide. A polypeptide identity search revealed an identity match over 
the entire length of the protein (46% conservation) with that of a hypothetical protein from C 
elegam (ZK370.3 protein; C elegam cosmid ZK370). This C e/egans protein shares iden- 
tity with the mouse talin gene, which encodes a 217 kDa protein implicated with maintain- 
ing integrity of the cytoskeleton. It also shares identity with the SLA2/MOP2/ END4 gene 
from Saccharomyces cerevisiae, which is known to code for an essential cytoskeletal 
associated gene required for the accumulation and or maintenance of plasma membrane H"^- 
ATPase on the cell surface. When pairwise comparisons are performed between HIPl and 
the C. elegans ZK370.3 protein (Genpept accession number celzk370.3), it shows 26% 
complete identity and an overall 46% level of conservation. Comparative analysis between 
HIPl and SLA2/MOP2/ END4 (EM BL accession number Z2281 1) demonstrate similar 
conservation (20% identity, 40% conservation). 

HIP2 is a 2.0 kb cDNA that encodes all but the 5'-most 33 amino acids of 
human £2^5^ ubiquilin conjugating enzyme. The resulting peptide has 100% identity with 
the previously characterized bovine E2,5„ protein. The cDNA has 95% nucleotide identity 
with the bovine cDNA. Ubiquitin-conjugating enzyme is an important component in 
ubiquitin-mediated protein degradation pathways. 

No difference in the strength of the interaction between HIP2 and HD 
constructs containing either 44 or 15 CAG repeats is detected using a quantitative (J- 
galactosidase assay. The expression pattern of HIP2 {E2.^^) in the various pans of the 
brain and nervous system appears to follow the specific neuropathology observed in HD, 
although there does not appear to be any difference in expression levels between HD- 
affected and HD- non-affected individuals. 
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The third cDN A encoding an HD- interacting protein is a 537 bp cDNA 
coding for 187 amino acids. A search of known DNA databases did not identify the 
sequence homology with any known genes. However, when a protein search was per- 
formed using the blatsp server, a strong identity between HIP3 and ankyrin-related proteins 
was observed. The strongest identity was with the D2021.8 gene product of C elegans. an 
uncharacterized gene, but there is also a 41 % identity with AKRl, a yeast ankyrin repeat- 
containing protein. Furthermore, when analogous structures with charge conservation over 
the same amino acid stretch are considered, there is 70% protein identity. HIP3 also shares 
approximately 60% amino acid conservation with human brain specific ankyrins (ankyrin B 
and ankyrin C). Thus, it is reasonable to conclude that HIP3, like known ankyrins, is a 
cytoskeletal protein, and may be involved, like previously characterized ankyrins in 
promoting interactions between the membrane skeleton and other membrane proteins. 

Further exploration of these three HD interacting proteins revealed several 
important facts about H\?\ that implicate it in a significantly in the pathogenesis of 
Huntington's Disease. First, as shown in Fig. 1, it was found that the strength of the 
interaction between HD protein and HIPl is dependent on the number of CAG repeats. 
Second, it was found that expression of the HIPl protein is not ubiquitous, but is limited to 
brain tissue. The highest amounts of expression are in the cortex, with lower levels being 
seen in the cerebellum, caudate and putamen. 

Both HIPl and HIP3 appear to be proteins which are involved in the 
maintaining the structural integrity of the cytoskeleton and various components of the 
cellular membrane, including microtubules and focal adhesions. Based upon this, the HD 
protein may be associated as part of the cytoskeletal matrix in cells where it is expressed, 
and our work supports the conclusion that binding of HIPl to the HD protein is necessary 
for the functional incorporation of the HD protein into the cell membrane. In this 
circumstance, the larger polyglutamine tract in huniingtin has a decreased ability for an 
HlPl-HD interaction. This decreased affinity for each other disrupts the normally strong 
HD-HIPI -cytoskeletal anchoring association. Further, the HIPl -HD interaction may be a 
critical interaction at the membranes of synaptic vesicles and a decrease in the affinity of 
HIPl for huntingtin may affect protein trafficking or membrane organization throughout 
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the neuron. Finally, we have demonstrated that HlPl and HD are both found in the Triton 
X-100 insoluble membrane compartment of the cell, therefore, a decreased interaction 
between HIPl and huntingtin may allow an abnormally subtle amount of huniingtin to be 
found in subcellular compartments in which it is normally found. 

5 As a result of all three of these phenomenon, increased apoptosis can occur 

in specific neurons within the striatum. This increase in apoptosis arises from an increased 
susceptibility of polyglutamine-expanded huntingtin to cleavage by apopain, and because 
more of the expanded forms of the HD protein may be available for cleavage (and 
subsequent apoptosis) due to the fact they are not as tightly associated at the HD-HIPl- 

10 cytoskeleial complex. 

This understanding of a biochemical basis for the pathogenesis of 
Huntington's Disease opens the doorway to a therapeutic method to ameliorate the 
pathology in patients expressing huntingtin protein with expanded polyglutamine tracts. In 
accordance with the method, the patient is treated to increase the amount of HIPl or an 

15 equivalent polypeptide which interacts less well with expanded Huntingtin than with 
Huntingtin having a CAG repeat region containing 15 to 35 repeats and facilitates the 
incorporation of Huntingtin into brain cell membranes. 

Increasing expression of HIPl or an equivalent polypeptide can be 
accomplished using gene therapy approaches. In general, this will involve introduction of 

20 DNA encoding HIP! in an expressable vector into the brain cells. Vectors which have 
been shown to, be suitable expression systems in mammalian cells include the herpes 
simplex viral based vectors: pHSVl (Gelleretal. Proc. Natl. Acad. Sci 87:8950-8954 
(1990)); recombinant retroviral vectors: MFC (Jaffee et al. Cancer Res. 53:2221-2226 
(1993)); Moloney-based retroviral vectors: LN, LNSX, LNCX, LXSN (Miller and 

25 Rosman Biotechniques 7:980-989 (1989)); vaccinia viral vector: MVA (Sutter and Moss 
Proc. Natl. Acad. Sci. 89:10847-10851 (1992)); recombinant adenovirus vectors : pJM17 
(All etal Gene Therapy 1:367-384 (1994)), (BerknerK. L. Biotechniques 6:616-624 
1988): second generation adenovirus vector: DE1/DE4 adenoviral vectors (Wang and 
Finer Nature Medicine 2:714-716 (1996) ); and Adeno-associated viral vectors: 

30 AAV/Neo (Muro-Cacho et ah J. Immunotherapy 11:231-237 (1992)). 
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Delivery of retroviral vectors to brain and nervous system tissue has been 
described in US Patents Nos. 4,866,042, 5,082,670 and 5,529,774, which are incorporated 
herein by references. These patents disclose the use of cerebral grafts or implants as one 
mechanism for introducing vectors bearing therapeutic gene sequences into the brain, as 
well as an approach in which the vectors are transmitted across the blood brain barrier. 

In addition to increasing the amount of HIPl present in brain cells of 
affected individuals, HD lethal phenotype may be rescued by coexpression of a HIPl and 
normal sized HD protein within the same cell, specifically within neurons. The over- 
expression of the normal HD protein and the presence of excess HIPl in the cell may be 
able to override the damaging effects of a decreased interaction between HIPl and an 
expanded form of the HD protein. Therefore, a "normal state" of interaction of HD with 
HIPl will rescue the cell from premature apoptotic death. Thus, a therapeutically desirable 
mammalian expression vector may include both a region encoding HIPl and a region 
encoding normal (less than 35 repeats) HD protein. 

To further illustrate the methods of making the materials which are the 
subject of this invention, and the testing which has established their utility, the following 
non-limiting experimental procedures are provided. 

EXAMPLE 1 

IDENTIFIC ATION OF INTERACTINrf PROTEINS 

GAL4-HD cDNA constmcts 

An HD cDNA construct (44pGBT9), with 44 CAG repeats was generated 
encompassing amino acids 1 - 540 of the published HD cDNA . This cDNA fragment was 
fused in frame to the GAL4 DNA-binding domain (BD) of the yeast two-hybrid vector 
pGBT9 (Clontech). Other HD cDNA constructs, 16pGBT9, 80pGBT9 and I28pGBT9 
were constructed, identical to 44pGBT9 but included only 16, 80 or 128 CAG repeats, 
respectively. 

Another clone (DMKDBamH!pGBT9) containing the first 544 amino acids 
of the myotonic dystrophy gene (a gift from R. Korneluk) was fused in-frame with the 
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GAL4-DN A BD of pGBT9 and was used as a negative control. Plasmids expressing the 
GAL4-BDRAD7 (D. Gietz, unpublished) and SIR3 were used as a positive control for the 
P-galactosidase filter assay. 

The clones 1T15-23Q, IT15-44Q and HAPl were generous gifts from Dr. C. 
Ross. These clones represent a previously isolated huntingtin interacting protein that has a 
higher affinity for the expanded form of the HD protein. 

Yeast strains; transformations and B-gal actosidasc assays 

..The yeast strain Y190 (MATa leu2-3,l 12, ura3-52, trpl-901, his3-A200, 
ade2-101, gal4Agal80A, URA3::GAL-lacZ, LYS2: :GAL-HIS3,cyc^) was used for all 
transformations and assays Yeast transformations were performed using a modified lithium 
acetate transformation protocol and grown at 30 C using appropriate synthetic complete (SC) 
dropout media. 

The P-galactosidase chromogenic filter assays were performed by transfer- 
ring the yeast colonies onto Whatman filters. The yeast cells were lysed by submerging the 
filters in liquid nitrogen for 15-20 seconds. Filters were allowed to dry at room tempera- 
ture for at least five minutes and placed onto filter paper presoaked in Z-buffer (100 mM 
sodium phosphate (pH7.0) 10 mM KCI, 1 mM MgSOj supplemented with 50 mM 
2-mercaptoethanol and 0.07 mg/ml 5-bromo-4-chloro-3-!ndolyl p-D-galactoside (X-gal). 
Filters were placed at 37 C for up to 8 hours. 

Yeast two-hvhrid screening for huntingtin interacting protein (HIP) 

cDN As from an human adult brain Matchmaker^"^ cDNA library (Clontech) 
was transformed into the yeast strain Y190 already harboring the 44pGBT9 construct. The 
transformants were plated onto one hundred 1 50 mm x 1 5 mm circular culture dishes 
containing SC media deficient in Trp, Leu and His. The herbicide 3-amino-triazole (3-AT) 
(25mM) was utilized to limit the number of false His+ positives (31). The yeast transformants 
were placed at 30 C for 5 days and P-galactosidase filter assays were performed on all 
colonies found after this time, as described above, to identify P-galactosidase+ clones. 
Primary His+/p-galactosidase+ clones were then orderly patched onto a grid on SC 
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-Trp/-Leu/-His (25 mM 3 AT) plates and assayed again for HisH- growth and the ability to turn 
blue with a filter assay. Secondary positives were identified for further analysis. Proteins 
encoded by positive cDNAs were designated as HIPs (Huntingtin Interactive Proteins). 
Approximately 4.0 x 10^ Trp/Leu auxotrophic iransformants were screened and of 14 clones 
isolated 12 represented the same cDNA (HIPl), and the other 2 cDNAs, HIP2 and HIP3 
were each represented only once. 

The HIP cDNA plasmids were isolated by growing the His-h/p- 
galactosidase+ colony in SC -Leu media overnight, lysing the cells with acid-washed glass 
beads and eleciroporaiing the bacterial strain, KC8 (leuB auxotrophic) with the yeast lysate. 
The KC8 ampicillin resistant colonies were replica plated onto M9 (-Leu) plates. The 
piasmid DNA from M9+ colonies was transformed into DH5-a for funher manipulation. 

EXAMPLE 2 
CONFIRMATION OF INTERACTIONS 

The HIPl -GAL4.AD cDNA activated both the lac-Z and His reporter genes in 
the yeast strain Y190 only when co-transformed with the GAL4-BD-HD construct, but nor 
the negative controls (Figure I) of the vector alone or a random fusion protein of the 
myoionin kinase gene. In order to assess the influence of the polyglutamine tract on the 
interaction between HlPl and HD, semi-quantitative p-galactosidase assays were 
performed. GAL4-BD-HD fusion proteins with 16, 44, 80 and 128 glutamine repeats were 
assayed for their strength of interaction with the GAL4-AD-HIP1 fusion protein. 

Liquid P-galactosidase assays were performed by inoculating a single yeast 
colony into appropriate synthetic complete (SC) dropout media and grown to OD600 
0.6-1.5. Five millilitres of overnight culture was pelleted and washed once with 1 ml of 
Z-Buffer, then resuspended in 100 ml Z-Buffer supplemented with 38 mM 2-mercapto- 
ethanol, and 0.05% SDS. Acid washed glass beads C\00 ml) were added to each sample 
and voriexed for four minutes/by repeatedly alternating a 30 seconds vortex, with 30 
seconds on ice. Each sample was pelleted and 10 ml of lysate was added to 500 ml of lysis 
buffer. The samples were incubated in a 30 C waterbath for 30 seconds and then 100 ml of 
a 4 mg/ml o-nitrophenyl b-D galactopyranoside (ONPG) solution was added to each tube. 
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The reaction was allowed to continue for 20 minutes at 30 C and stopped by the addition of 
500 ml of 1 M Na,C03 and placing the samples on ice. Subsequently, OD420 was taken in 
order to calculate the P-galactosidase activity with the equation 1000 x OD420/(t x V x 
OD600) where t is the elapsed time (minutes) and V is the amount of lysate used. 

The specificity of the HIPl-HD interaction can be observed using the 
chromogenic filter assay. Only yeast cells harboring HIPl and HD activate both the HIS 
and lacZ reporter genes in the Y190 yeast host. The cells that contain the HIPl with HD 
constructs withiSO or 128 CAG repeats turn blue approximately 45 minutes after the cells 
with the smaller sized repeats (16 or 44). 

No difference in the P-galactosidase activity was observed between the 16 
and 44 repeats or between the 80 and 128 repeals. However, a significant difference 
(p<0.05) in activity is seen between the smaller repeats (16 and 44) and the larger repeats 
(80 and 128). (Figure 1) 

EXAMPLE 3 

HNA SEQUENCING cDNA ISOLATION AND 5^ RACE 
Oligonucleotide primers were synthesized on an ABl PCR-mate oligo- 
synthesizer. DNA sequencing was performed using an ABI 373 fluorescent automated 
DNA sequencer. The HIP cDNAs were confirmed to be in-frame with the GAL4-AD by 
sequencing across the AD-HIPl cloning junction using an AD oligonucleotide (5'GAA 
GAT ACC CCA CCA AAC3'). 

Subsequently, primer walking was used to determine the remaining 
sequences. A human frontal cortex >4.0 kb cDNA library (a gift from S. Montal) was 
screened to isolate the full length HIPl gene. Fifty nanograms of a 558 base pair Eco RI 
fragment from the original HIPl cDNA was radioactively labeled with ja^-PJ-dCTP using 
nick-translation and the probe allowed to hybridized to filters containing > 105 pfu/ml of 
the cDN A library overnight at 65 C in Church buffer (see Northern blot protocol). The 
filters were washed at 65 C for 10 minutes with 1 X SSPE, 15 minutes at 65 C with I X 
SSPE and 0. 1 % SDS, then for thirty minutes and fifteen minutes with 1 X SSPE and 0.1 % 
SDS. The filters were exposed to X-ray film (Kodak, XAR5) overnight at -70 C. Primary 



wo 97/18825 



PCT/US96/18370 



- 12 - 

positives were isolated and replated and subsequent secondary positives were hybridized 
and washed as for the primary screen. The resulting positive phage were converted into 
plasmid DNA by conventional methods (Stratagene) and the cDNA isolated and sequenced. 

In order to obtain the most 5' sequence of the HIPl gene, a Rapid 
Amplification of cDNA Ends (RACE) protocol was performed according lo the 
manufacturers recommendations (BRL). First strand cDNA was synthesized using the 
oligo HIP1-242R (5' GCT TGA CAG TGT AGT CAT AAA GGT GGC TGC ACT CC 
3'), After dCTP tailing the cDNA with terminal deoxy transferase, two rounds of 35 
cycles (94 C 1 minute; 53 C 1 minute; 72 C 2 minutes) of PGR using HIP1-R2 (5' GGA 
CAT GTC CAG GGA GTT GAA TAC 3') and an anchor primer (5* (CUA)4 GGC CAC 
GCG TCG ACT AGT ACG GGI IGG Gil GGG nG3 ) (BRL) were performed. The 
subsequent 650 base pair PCR product was cloned using the TA cloning system 
(Invitrogen) and sequenced using T3 and T7 primers. Sequences ID Nos. 1 and 5 show the 
sequence of t:ie HIPl cDNAs obtained. 

EXAMPLE 4 
DNA AND AMINO ACID ANALYSES 
Overlapping DNA sequence was assembled using the program MacVector 
and sent via email or Netscape lo the BLAST server at NIH (http://www.ncbi.nlm.nih.gov) 
to search for sequence similarities with known DNA (blastn) or protein (tblastn) sequences. 
Amino acid alignments were performed with the program Clustalw. 

EXAMPLES 

FISH DETECTION SYSTEM AND IMAGE ANALYSIS 
The HIPl cDNA isolated from the two-hybrid screisn vt^as mapped by 
fluorescent in situ hybridization (FISH) to normal human lymphocyte chromosomes 
counterstained with propidium iodide and DAPI. Biotinylaied probe was detected with 
avidin-fluorescein isothiocyanate (FITC). Images of metaphase preparations were captured 
by a thermoelectrically cooled charge coupled camera (Photometries). Separate images of 
DAPI banded chromosomes and FITC targeted chromosomes were obtained. Hybridization 
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signals were acquired and merged using image analysis software and pseudo colored blue 
(DAP!) and >elIow (FITC) as described and overlaid electronically. This study showed 
that HIPl maps to a single genomic locus at 7ql 1.2. 

EXAMPLE 6 
NORTHERN M OT ANALYSIS 

RNA was isolated using the single step method of homogenization in 
guanidinium isothiocyante and fractionated on a L0% agarose gel containing 0.6 M 
formaldehyde. The RNA was transferred to a hybond N -membrane (Amersham) and 
crossi inked with ultraviolet radiation. 

Hybridization of the Northern blot with b-actin as an internal control probe 
provided confirmation that the RNA was intact and had transferred. The L2 kb HIPl 
cDNA was labeled using nick translation and incorporation of oc^'P-dCTP. Hybridization 
of the original 1.2 kb HIPl cDNA was carried out in Church buffer (0.5 M sodium 
phosphate buffer, pH 7.2, 2,7% sodium dodecyl sulphate, I mM EDTA) at 55 C over- 
night. Following hybridization. Northern blots were washed once for 10 minutes in 2.0 X 
SSPE, 0.1% SDS at room temperature and twice for 10 minutes in 0.15 X SSPE, 0.1% 
SDS, Autoradiography was carried our from one to three days using Hyperfilm 
(Amersham) film at -70 C. 

Analysis of the levels of RNA levels of HIPl by Northern blot data revealed 
that the 10 kilo:base HIPl message is present in all tissue assessed. However, the levels of 
RNA are not uniform, with brain having highest levels of expression and peripheral tissues 
having less message. No apparent differences in RNA expression was noted between 
control samples and HD affected individuals. 

EXAMPLE 7 
TTSSTTE LOCA IJ7ATIQN OF HIP! 
Tissue localization of HIPl was studied using a variety of techniques as 
described below. Subcellular distribution of HIP- 1 protein in adult human and mouse 
brain Biochemical fractionation studies revealed the HIPl protein was found to be a 
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membrane-associated protein. No immunoreactivity was seen by Western blotting in 
cytosolic fractions, using the anti-HIPl-pepl polyclonal antibody. HIPl immunoreactivity 
was observed in all membrane fractions including nuclei (PI), mitochondria and synapto- 
somes (P2), microsomes and plasma membranes (P3). The P3 fraction contained the most 
5 HIPl compared to other membrane fractions. HIPl could be removed from membranes by 
high salt (0.5M NaCI) buffers indicating it is not an integral membrane protein, however, 
since low salt (0.1- 0.25M NaCI) was only able to partially remove HIPl from membranes, 
its membrane association is relatively strong. The extraction of P3 membranes with the 
non-ionic detergent, Triton X-100 revealed HIPl to be a Triton X-1(X) insoluble protein. 

to This characteristic is shared by many cytoskeletal and cytoskeletal-associated membrane 
proteins including actin, which was used as a control in this study. The biochemical 
characteristics of HfPl described were found to be identical in mouse and human brain and 
was the same for both forms of the protein (both bands of the HIPl doublet). HIPl 
co-localized with huntingtin in the P2 and P3 membrane fractions, including the high-salt 

15 membrane extractions, as well as in the Triton X-100 insoluble residue. The subcellular 
distribution of HIPl was unaffected by the expression of polygluiamine-expanded 
huntingtin in transgenic mice and HD patient brain samples. 

The localization of HIPl protein was further investigated by immunohisto- 
chemistry in normal adult mouse brain tissue. Immunoreactivity was seen in a patchy, 

20 reticular pattern in the cytoplasm, appeared excluded from the nucleus and stained most 

intensely in a discontinuous pattern at the membrane. These results are consistent with the 
association of HIPl with the cytoskeletal matrix arid funher indicate an enrichment of HIPl 
at plasma membranes. Immunoreactivity occurred in all regions of the brain, including 
cortex, striatum, cerebellum and brainstem, but appeared most strongly in neurons and 

25 especially in cortical neurons. As described previously, huntingtin immunoreactivity was 
seen exclusively and uniformly in the cytosol. 

The in situ hybridization studies showed HIPl mRNA to be ubiquitously and 
generally expressed throughout the brain. This data is consistent with the immunohisto- 
chemical results and was identical to the distribution pattern of huntingtin mRNA in 

30 transgenic mouse brains expressing full-length human huntingtin. 
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Protein Preparation And Wej;tern Blot ting For Expression Studies 

Frozen human tissues were homogenized using a Polytron in a buffer 
containing 0.25M sucrose, 20mM Tris-HCl (pH 7.5), lOmM EGTA, 2mM EDTA 
supplemented with lOug/ml of leupeptin, soybean trypsin inhibitor and ImM PMSF, then 

5 centrifuged at 4,000rpm for 10* at 4 C to remove cellular debris. 100-150ug/lane of protein 
was separated on 8% SDS-PAGE mini-gels and then transferred to PVDF membranes. 
Huntingtin and HIPl were electroblotted overnight in Towbin's transfer buffer (25 mM 
Tris-HCl, 0.1 92M glycine, pH8.3, 10% methanol) at 30V onto PVDF membranes 
(Immobilon-P, Millipore) as described (Towbin et al, Proc. Nat'l Acad. Sa\(USA) 76: 

10 4350-4354 (1979)). Membranes were blocked for 1 hour at room temperature in 5% skim 
milk/ TBS (lOmM Tris-HCi, 0.15M NaCl, pH7.5). Antibodies against huntingtin (pAb 
BKPl, 1:500), actin (mAb A-4700, Sigma, 1:500) or HlPl (pAb HlP-pepl, 1:200) were 
added to blocking solution for 1 hour at room temperature. After 3x10 minutes washes 
in TBS-T (0.05% Tween-20/TBS), secondary Ab (horseradish peroxidase conjugated IgG, 

15 Biorad) was applied in blocking solution for I hour at room temperature. Membranes were 
washed and then incubated in chemiluminescent ECL solution and visualized using 
Hyperfilm-ECL film (Amersham). 

Generation of Antibodies 

The generation of huntingtin specific antibodies OHM 1 and BKPl is des- 
cribed elsewhere (Kalchman, et al., 7. Biol. Chem. 271: 19385-19394 (1996)). The HIPl 
peptide (VLEKDDLMDMDASQQN, a.a. 76-91 of Seq. ID No. 2) was synthesized with 
Cys on the N-terminus for the coupling, and coupled to Keyhole limpet hemocyanin 
(KLH) (Pierce) with succinimidyl 4-(N-maleimidomethyl) cyclohexame-l-carboxylate 
(Pierce). Female New Zealand White rabbits were injected with HIPl peptide-KLH and 
Freund's adjuvant. Antibodies against the HIPl peptide were purified from rabbit sera 
using affinity column with low pH elution. Affmity column was made by incubation of 
HIPl peptide with activated thio-Sepharose (Pharmacia). 

Western blotting of various peripheral and brain tissues were consistent with 
the RNA data. The HIPl protein levels observed was not ubiquitous. The protein 
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expression is limited to brain tissue, with highest amounts seen in the cortex and lower 
levels seen in the cerebellum and caudate and putamen. 

More regio-specific analysis of HIPl expression in the brain revealed no 
differential expression pattern in affected individuals when compared to normal controls, 
5 with highest levels of expression seen in both controls and HD patients in the cortical 
regions. 

EXAMPt^E 8 

CO-LMMLNOPRECIPITATION OF HIPl WITH HUNTiNGTIN 

10 Confirmation of the HD-HIPl interaction was performed using coimmuno- 

prepitation as follows. Control human brain (frontal cortex) lysate was prepared in the 
same manner as for subcellular localization study. Prior to immunoprecipitaiion, tissue 
lysate was centrifuged at 5000 rpm for 2 minutes at 4 C, then the supernatant was 
pre-cleared by the incubated with excess amount of Protein A-Sepharose for 30 minutes at 

15 4 C, and centrifuged at the same condition. Fifty microliires of supernatant (500 mg 
protein) was incubated with or without antibodies (10 ug of anti-huntingtin GHMl 
(Kalchman, et al. 1996) or anti-synaptobrevin antibody) in the total 500 u! of incubation 
buffer (20mM Tris-CI (pH7.5), 40mM NaCl, ImM MgCI ,) for 1 hour at 4 C. Twenty 
microlitres of Protein A-Sepharose (1:1 suspension, for GHMl and no antibody control) or 

20 Protein G-Sepharose (for anti-synaptobrevin antibody: Pharmacia) was added and 

incubated for I hour at 4 C. The beads were washed with washing buffer (incubation 
buffer containing 0.5 % Triton X-100) three times. The samples on the beads were 
separated using SDS-PAGE (7.5% acrylamide) and transferred to PVDF membrane 
(Immobilon-P, Millipore). The membrane was cut at about 150 kDa after transfer for 

25 Western blotting (as described above). The upper piece was probed with anti-huntingtin 
BKPl (1/1(XX)) and lower piece with anti-HIPl antibody (1/300). ' 

The results showed that when an anti-HIPl polyclonal antibody was 
immunoreacted against a blot containing the GHMl immunoprecipitates from the brain 
lysate a doublet was observed at approximately 100 kDa was. When GHMl was immuno- 

30 reacted against the same immunoprecipiiate the 350 kDa HD protein was also seen The 



wo 97/18825 PCT/US96/18370 

- 17 - 



specificity of the HD-HIPl interaction is seen as no immunoreactive bands seen are as a 
result of the proteins adsorbing to the Protein-A-Sepharose (Lysate + No Antibody) or 
when a random, non related antibody (Lysate + anti-Synapiobrevin) is used as the 
immunoprecipitating antibody. 



RXAMPLE 9 

■Siihcellular fractionation of hrain tissue 

Cortical tissue (20-100 mg/ml) was homogenized, on ice, in a 2 ml 
pyrex-teflon IKA-RWI5 homogenizer (Tekmar Company) in a buffer containing 0.303M 
sucrose, 20mM Tris-HCI pH 6.9, ImM MgCU, 0.5mM EDTA, ImM PMSF. ImM 
leupeptin. soybean trypsin inhibitor and ImM benzamidine (Wood et a\.. Human Mofec. 
Genet. 5: 481-487 (1996)). 

Crude membrane vesicles were isolated by two cycles of a three-step differ- 
ential centrifugation protocol in a Beckman TLA 120.2 rotor at 4 C based on the methods 
of Wood et a! (1996). The first step precipitated cellular debris and nuclei from tissue 
homogenates for 5 minutes at 1300 x g (PI). The 1300 x g supernatant was subsequently 
cemrifuged for 20 minutes at 14 000 x g to isolate synaptosomes and mitochondria (P2). 
Finally, microsomal and plasma membrane vesicles were collected by a 35minute 
centrifugation at 142 000 x g (P3). The remaining supernatant was defined as the cytosolic 
fraction. 

Hi gh .salt extraction of membranes 

Aliquots of P3 membranes were twice suspended at 2mg/ ml in 0.5M NaCI, 
lOmM Tris-HCI, 2mM MgCl,, pH7.2, containing protease inhibitors (see above). The 
same buffer without NaCl was used as a control. The membrane suspensions were 
incubated on ice for 30 minutes and then centrifuged at 142 000 x g for 30 minutes. 

Fxtraciion of cvtoskeletal and cvtoskeletai-asSQCiat g d p rptgipg. 

To extract cytoskeletal proteins, crude membrane vesicles from the P3 
fraction membrane were suspended in a volume of Triton X-100 extraction buffer to give a 
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protein: detergent ratio of 5:L The composition of the Triton X-100 extraction buffer was 
based on the methods of Arai et al., J, Neuroscience 38: 348-357 (1994) and contained 
2% Triton X-100, lOmM Tris-HCI, 2mM MgCl^, ImM leupeptin, soybean trypsin 
inhibitor, PMSF and benzamidine. Membrane pellets were suspended by hand with a 
5 round-bottom teflon pestle, and placed on ice for 40 minutes. Insoluble cytoskeletal 

matrices were precipitated for 35 minutes at 142 000 x g in a Beckman TLA 120.2 rotor. 
The supernatant was defined as non-cytoskeletal-associated membrane or membrane- 
-associated protein and was removed. The remaining pellet was extracted with Triton 
X-lOO a second time using the same conditions. We defined the final pellet as cytoskeletal 
10 and cytoskeletal -associated protein. 

Solubilization of protein and analysis bv SDS-PAGE and Western Blotting 

Membrane and cytoskeletal protein was solubilized in a minimum volume of 
1 % SDS, 3M urea, 0. ImM dithiothreitol in TBS buffer and sonicated. Protein concen- 
15 traiion was determined using the BioRad DC Protein assay and samples vvere diluted at 
least 1 X with 5 X sample buffer (250mM Tris-HCl pH 6.8, 10% SDS, 25% glycerol, 
0.02% bromophenol blue and 7% 2-mercaptoethanoI) and were loaded on 7.5% 
SDS PAGE gels (Bio-Rad Mini-PROTEIN II Cell system) without boiling. Western 
blotting was performed as described above. 

20 

Immunohistochemistrv 

Brain tissue was obtained from a normal C57BL/6 adult (6 months old) male 
mouse sacrificed with chloroform then perfusion-fixed with 4% v/v paraformaldehyde/0.01 
M phosphate buffer (4% PFA). The brain tissues were removed, immersion fixed in 4% 

25 PFA for 1 day. washed in 0.01 M phosphate buffered saline, pH 7,2 (PBS) for 2 days, and 
then equilibrated in 25% w/v sucrose PBS for 1 week. The samples were then snap-frozen 
in Tissue Tek molds by isopentane cooled in liquid nitrogen. After warming to -20 C, 
frozen blocks derived from frontal cortex, caudate/putamen, cerebellum and brainstem 
were cut into 14 mm sections for immunohistochemistry. Following washing in PBS, the 

30 tissue sections were blocked using 2.5% v/v normal goat serum for 1 hour at room 
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temperature. Primary antibodies diluted with PBS were applied to sections overnight at 4 
C. Optimal dilutions for the polyclonal amibodies BKPl and HIPl were 1:50. Using 
washes of 3 x 5 minutes in PBS at room temperature, sections were sequentially incubated 
with biotinylated secondary antibody and then an avidin-biotin complex reagent (Vecta 
5 Stain ABC Kit, Vector) for 60 minutes each at room temperature. Color was developed 
using 3-3'-diaminobenzidine tetrahydrocholoride and ammonium nickel sulfate. 

For controls, sections were treated as described above except that HIPl 
antibody aliquots were preabsorbed with an excess of HIPl peptide as well as a peptide 
unrelated to HIPl prior to incubation with the tissue sections. 

10 

In situ hybridization 

In situ hybridization was performed as previously described with some 
modification -Suzuki et al, BBRC2\9: 708-713 (1996)). The RNA probes were prepared 
using the plasmid gtl49 (Lin, B., et al.. Hitman Moiec. Genet. 2: 1541-1545 (1994)) or a 
15 558 subclone of HIPl. The anti-sense and sense single-stranded RNA probes were 
synthesized using T3 and T7 RNA polymerases and the In Vitro Transcription Kit 
(Clontech) with the addition of |a"S|-CTP (Amersham) to the reaction mixture. Sense 
RNA probes were used as negative controls. For HIPl studies normal C57BL/6 mice were 
used. Huntingtin probes were tested on two different transgenic mouse strains expressing 
20 full-length huntingtin, cDNA HD10366(44CAG) C57BL/6 mice and YAC 

HD10366(18CAG) FVB/N mice. Frozen brain sections (lOum thick) were placed onto 
silane-coated slides under RNase-free conditions. The hybridization solution contained 
40% w/v formamide. 0.02M Tris-HCl <pH 8.0), 0.005M EDTA, 0.3 M NaCI, O.OIM 
sodium phosphate (pH 7.0). Ix Denhardfs solution. 10% w/v dextran sulfate (pH 7.0), 
25 0.2% w/v sarcosyl, yeast iRNA (500mg/ml) and salmon sperm DNA (200mg/ml). The 

radiolabelled RNA probe was added to the hybridization solution to give 1 x 106 cpm/200 
ul/ section. Sections were covered with hybridization solution and incubated on 
formamide paper at 65 C for 18 hours. After hybridization, the slides were washed for 30 
minutes sequentially with 2x SSC, Ix SSC and high stringency wash solution (50% 
30 formamide, 2x SSC and 0. 1 M dithiothreiiol) at 65 C. followed by treatment witb Rnase A 
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(Img/ml) at 37 C for 30 minutes, then washed again and air-dried. The slides were first 
exposed on autoradiographic film (b-max, Amersham, UK) for 48 hours and developed for 
4 minutes in Kodak D-19 followed by a 5 minute fixation in Fuji-fix. For longer 
exposures, the slides were dipped in autoradiographic emulsion (50% w/v in distilled 
water, NR-2, Konica, Japan), air-dried and exposed for 20 days at 4 C then developed as 
described. Sections were counterstained with methyl green or Giemsa solutions. 
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SFOl FFNrF LISTING 

( 1 ) GENERAL INFORMATION: 

(i) APPLICANT: Kalchman, Michael 
Goldberg, Paul 

Hayden. Michael R. 

(ii) TITLE OF INVENTION: Protein Which Interacts with the Huntington's Disease Gene 
Product. cDNA Coding Therefor, and Antibodies Thereto 

(iii) NUMBER OF SEQL'ENCES: 8 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Oppedahl & Larson 

(B) STREET: 1992 Commerce Street Suite 309 

(C) CITY: Yorktown 

(D) STATE: NY 

(E) COUNTRY: USA 

(F) ZIP: 10598 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette, 3,50 inch, I 44 Kb storage 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: MS DOS 5.0 

(D) SOFTWARE: WordPerfect 

(vi) CURRENT APPLICATION DATA 

(A) APPLICATION NUP4BER 

(B) FILING DATE 

(C) CLASSIFICATION 

(viii) ATTORNEY/ AGENT INFORMATION 

(A) NAJVfE Larson. Marina T 

(B) REGISTRATION NUMBER: 32038 

(C) REFERENCE/DOCKET NUMBER UBC P-013 

(ix) TELECOMMUNICATION INFORMATION 

(A) TELEPHONE: (914) 245-3252 

(B) TELEFAX: (914) 962-4330 

(2) INFORMATION FOR SEQ ID NO 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1164 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: no 

(iv) ANTI-SENSE: no 
(vi) ORIGINAL SOURCE: 
(A) ORGANISM: human 

(ix) FEATURE: cDNA for Huntingtin-interacting protein 
(xi)SEOUENCE DESCRIPTION: SEQ ID NO 1 : 



wo 97/18825 



PCT/US96/18370 



. 22 - 



ACAGCTGACA CCCTGC7UVGG CCACCGGGAC CGCTTCATGG AGCAGTTTAC 50 

AAAGTTGAAA GATCTGTTCT ACCGCTCCAG CAACCTGCAG TACTTCAAGC 100 

GGGTCATTCA GATCCCCCAG CTGCCTGAGA ACCCACCCAA CTTCCTTGCGA 150 

GCCTCAGCCC TGTCAGAACA TATCAGCCCT GTGGTGGTGA TCCCTGCAGA 200 

GGCCTCATCC CCCGACAGCG AGCCAGTCCT AGAGAAGGAT GACCTCATGG 250 

ACATGGATGC CTCTCAGCAG AATTTATTTG ACAACAAGTT TGATGACNTC 3 00 

TTTGGCAGTT CATCCAGCAG TGATCCCTTC AATTTCAACA GTCAAAATGG 350 

TGTGAACAAG GATGAGAAGG ACCACTTAAT TGAGCGACTA TACAGAGAGA 400 

TCAGTGGATT GAAGGCACAG CTAGAAAACA TGAAGACTGA GAGCCAGCGG 4 50 

GTTGTGCTTGC AGCTGAAGGG CCACGTCAGC GAGCTGGAAG CAGATCTGGC 500 

CGAGCAGCAG CACCTGCGGC AGCAGGCGGC CGACGACTGT GAATTCCTGC 550 

GGGCAGAACT GGACGAGCTC AGGNGGCAGC GGGAGGACAC CGAGAAGGCT 600 

CAGCGGAGCC TGTCTGAGAT AGAAAGGAAA GCTCAAGCCA ATGAACAGCG 650 

ATATAGCAAG CTAAAGGAGA AGTACAGCGA GCTGGTTCAG AACCACGCTG 700 

ACCTGCTGCG GAAGAATGCA GAGGTGACCA AACAGGTGTC CATGGCCAGA 750 

CAAGCCCAGG TAGATTTGGA ACGAGAGAAA AAAGAGCTGG AGGATTCGTT 800 

GGAGCGCATC AGTGACCAGG GCCAGCGGAA GACTC/ykGAA GAGCTGGAAG 850 

TTCTAGAGAG CTTGAAGCAG GAACTTGGCA CAAGCCAACG GGAGCTTCAG 900 

GTTCTGCAAG GCAGCCTGGA AACTTCTGCC CAGTCAGAAG CAAACTGGGC 950 

AGCCGAGTTC GCCGAGCTAG AGAAGGAGCG GGACAGCCTG GTGAGTGGCG 1000 

CAGCTCATAG GGAGGAGGAA TTATCTGCTC TTCGGAAAGA ACTGCAGGAC 1050 

ACTCAGCTCA AACTGGCCAG CACAGAGGAA TCTATGTGCC AGCTTGCCAA 1100 

AGACCAACGA AAAATGCTTC TGGTGGGGTC CAGGAAGGCT GCGGAGCAGG 1150 

TGATACAAGA CGCG 1164 



(2) INFORMATION FOR SEQ ID NO 2 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH; 386 

(B) TYPE protein 

(D) TOPOLOGY linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL no 
(vi) ORIGINAL SOURCE 
(A) ORGANISM: human 

(ix) FEATURE: Huntingtin-interacting protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2 



Thr Ala Asp Thr Leu Gin Gly His Arg Asp Arg Phe Met Glu Gin 
1 5 10 15 

Phe Thr Lys Leu Lys Asp Leu Phe Tyr Arg Ser Ser Asn Leu Gin 

20 25 30 

Tyr Phe Lys Arg Val lie Gin lie Pro Gin Leu Pro Glu Asn Pro 

35 40 45 

Pro Asn Phe Leu Arg Ala Ser Ala Leu Ser Glu His lie Ser Pro 

50 55 60 

Val Val Val lie Pro Ala Glu Ala Ser Ser Pro Asp Ser Glu Pro 

65 70 75 
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Val Leu Glu Lys Asp Asp Leu Met Asp Met Asp Ala Ser Gin Gin 



80 



Asn Leu Phe Asp Asn Lys Phe Asp Asp Phe Gly Ser Ser Ser Ser 

95 100 105 

Ser Asp Pro Phe Asn Phe Asn Ser Gin Asn Gly Val Asn Lys Asp 
^ - 115 120 



110 



Glu Lys Asp His Leu He Glu Arg Leu Tyr Arg Glu He Ser Gly 

125 130 135 

Leu Lys Ala Gin Leu Glu Asn Met Lys Thr Glu Ser Gin Arg Val 



Val Leu Gin Leu Lys Gly His Val Ser Glu Leu Glu Ala Asp Leu 

160 165 



155 



Ala Glu Gin Gin His Leu Arg Gin Gin Ala Ala Asp Asp Cys Glu 

175 ISO 



170 



Phe Leu Arg Ala Glu Leu Asp Glu Leu Arg Gin Arg Glu Asp Thr 

190 195 



185 



Glu Lys Ala Gin Arg Ser Leu Ser Glu He Glu Arg Lys Ala Gin 

205 210 



200 



Ala Asn Glu Gin Arg Tyr Ser Lys Leu Lys Glu Lys Tyr Ser Glu 

215 220 225 

Leu Val Gin Asn His Ala Asp Leu Leu Arg Lys Asn Ala Glu Val 

230 235 240 

Thr Lvs Gin Val Ser Met Ala Arg Gin Ala Gin Val Asp Leu Glu 

245 250 255 

Arg Glu Lys Lys Glu Leu Glu Asp Ser Leu Glu Arg He Ser Asp 

260 265 270 

Gin Gly Gin Arg Lys Thr Gin Glu Gin Leu Glu Val Leu Glu Ser 

275 280 285 

Leu Lys Gin Glu Leu Gly Thr Ser Gin Arg Glu Leu Gin Val Leu 

290 295 300 

Gin Gly Ser Leu Glu Thr Ser Ala Gin Ser Glu Ala Asn Trp Ala 

305 310 315 

Ala Glu Phe Ala Glu Leu Glu Lys Glu Arg Asp Ser Leu Val Ser 

320 325 330 

Gly Ala Ala His Arg Glu Glu Glu Leu Ser Ala Leu Arg Lys Glu 

335 340 345 
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Leu Gin Asp Thr Gin Leu Lys Leu Ala Ser Thr Glu Glu Ser Met 

350 355 360 

Cys Gin Leu Ala Lys Asp Gin Arg Lys Met Leu Leu Val Gly Ser 

365 370 375 

Arg Lys Ala Ala Glu Gin Val lie Gin Asp Ala 

380 385 386 



(2) INFORMATION FOR SEQ ID N0:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: no 

(iv) ANTI-SENSE, no 
(vi) ORIGINAL SOURCE: 
(A) ORGANISM: human 

(ix) FEATURE: cDNA for Huntingtin-interacting protein 
(xi)SEOUENCE DESCRIPTION SEQ ID N0:3: 

ACCGATACCG AAGCGGGCTG TGTGCCCCTT CTCCACCCAG AGGAAATCAA 50 
ACCCCAAAGC CATTATAACC ATGGATATGG TGAACCTCTT GGACGGAAAA 100 
CTCATATTGA TGATTACAGC ACATGGGACA TAGTCAAGGC TACACAATAT 150 
GGAATATATG AACGCTGTCG AGAATTGGTG GAAGCAGGTT ATGATGTACG 200 
GCAACCGGAC AAAGAAAATG TTACCCTCCT CCATTGGGCT GCCATCAATA 250 
ACAGAATAGA TTTAGTCAAA TACTATATTT CGAAAGGTGC TATTGTGQAT 300 
CAACTTGGAG GGGACCTGAA TTCAACTCCA TTGCACTGGG ACACAAGACA 350 
AGGCCATCTA TCCATGGTTG TGCAACTAAT GAAATATGGT GCAGATCCTT 400 
CATTAATTGA TGGAGAAGGA TGTAGCTGTA TTCATCTGGC TGCTCAGTTC 450 
GGACATACCT CAATTGTTGC TTATCTCATA GCAAAAGGAC AGGATGTG 498 



(2) INFORMATION FOR SEQ ID NO:4: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 154 

(B) TYPE: protein 

(D) TOPOLOGY: linear 
{ii)MOLECULE TYPE: protein 
(iii) HYPOTHETICAL: no 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: human 
(ix) FEATURE: Huntingtin-interacting protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4 
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Thr Asp Thr Glu Ala Gly Cys Val Pro Lieu Leu His Pro Glu Glu 
1 5 10 15 

lie Lys Pro Gin Ser His Tyr Asn His Gly Tyr Gly Glu Pro Leu 

20 25 30 

Gly Arg Lys Thr His He Asp Asp Tyr Ser Thr Trp Asp He Val 

35 40 45 

Lys Ala Thr Gin Tyr Gly He Tyr Glu Arg Cys Arg Glu Leu Val 

50 55 60 

Glu Ala Gly Tyr Asp Val Arg Gin Pro Asp Lys Glu Asn Val Thr 

65 70 75 

Leu Leu His Trp Ala Ala He Asn Asn Arg He Asp Leu Val Lys 

80 85 90 

Tyr Tyr He Ser Lys Gly Ala He Val Asp Gin Leu Gly Gly Asp 

95 100 105 

Leu Asn Ser Thr Pro Leu His Trp Asp Thr Arg Gin Gly His Leu 

110 115 120 

Ser Met Val Val Gin Leu Met Lys Tyr Gly Ala Asp Pro Ser Leu 

125 130 135 

He Asp Gly Glu Gly Cys Ser Cys He His Leu Ala Ala Gin Phe 

140 145 150 

Gly His Thr Ser 
154 



(2) INFORMATION FOR SEQ ID NO: 5 

(i) SEQUENCE-CHARACTERISTICS: 

(A) LENGTH:- 4846 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: no 

(iv) ANTI-SENSE: no 
(vi) ORIGINAL SOURCE: 
(A) ORGANISM: human 

(ix) FEATURE: cDNA for Huntingtin-interacting protein 
(xi)SEQUENCE DESCRIPTION: SEQ ID N0:5: 

CAGTGTACGG TTGATCATAT AACGCCGCGG GCGGGGATTG G TTTA TATAT 50 
CGCAAATTGA TNTAGGGGGG GGGGGATGGN CAGAGATTTC GCTTCATTAG 100 
GCCATTATAA GCAGGAAGGG TTTCAAGGAA A AAAA CCCAG AAAGTGCATA 150 
TTGCACCCAC CATGAGAAAG GGGCAACAGA CCTTNTGTTN TGTTNTCAAC 200 
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CGCCTGCTTC TGTTTTAGCA ACGCAGTGTT 
TGTTCCACAA ANTCTTCCGA GATGGACACC 
GTGAGATACA GAAATGAATT GAGTGACATG 
GAGCGAGGGG TATGGCCAGC TGTGCAGCAT 
CCAAGATGGA GTACCACACC AAAAATCCCA 
ATGAGTGACC GCCAGCTGGA CGAGGCTGGA 
TTTCCAGTTA ACAGTGGAGA TGTTTGACTA 
TCTTCCAAAC AGTATTCAAC TCCCTGGACA 
ACGGCAGCAG GGCAGTGCCG CCTCGCCCCG 
CTGCAGCCAC CTTTATGACT ACACTGTCAA 
CCTGCCTCCC AGCTGACACC CTGCAAGGCC 
CAGTTTACAA AGTTGAAAGA TCTGTTCTAC 
CTTCAAGCGG CTCATTCAGA TCCCCCAGCT 
TCCTGCGAGC CTCAGCCCTG TCAGAACATA 
CCTGCAGAGG CCTCATCCCC CGACAGCGAG 
CCTCATGGAC ATGGATGCCT CTCAGCAGzyv 
ATGACATCTT TGGCAGTTCA TTCAGCAGTG 
CAAAATGGTG TGAACAAGGA TGAGAAGGAC 
CAGAGAGATC AGTGGATTGA AGGCACAGCT 
GCCAGCGGGT TGTGCTGCAG CTGAAGGGCC 
GATCTGGCCG AGCAGCAGCA CCTGCGGCAG 
ATTCCTGCGG GCAGAACTGG ACGAGCTCAG 
AGAAGGCTCA GCGGAGCCTG TCTGAGATAG 
GAACAGCGAT ATACCAAGCT AAAGGAGAAG 
CCACGCTGAC CTGCTGCGGA AGAATGCAGA 
TGGCCAGACA AGCCCAGGTA GATTTGGAAC 
GATTCGTTGG AGCGCATCAG TGACCAGGGC 
GCTGGAAGTT CTAGAGAGCT TGAAGCAGGA 
AGCTTCAGGT TCTGCAAGGC AGCCTGGAAA 
AACTGGGCAG CCGAGTTCGC CGAGCTAGAG 
GAGTGGCGCA GCTCATAGGG AGGAGGAATT 
TGCAGGACAC TCAGCTCAAA CTGGCCAGCA 
CTTGCCAAAG ACCAACGAAA AATGCTTCTG 
GGAGCAGGTG ATACAAGACG CCCTGAACCA 
TCAGCTGCGC TGGGTCTGCA GATCACCTCC 
TCCAGCTGCA TCGAGCAACT GGAGAAAAGC 
CCCAGAAGAC ATCAGTGGAC TTCTCCATTC 
TGACCAGCGA CGCCATTGCT CATGGTGCCA 
CCTGAGCCTG CCGACTCACT GACCGAGGCC 
AACCCTCGCC TACCTGGCCT CCCTGGAGGA 
CCGACAGCAC AGCCATGAGG AACTGCCTGA 
GAGGAGCTCC TGCCCAGGGG ACTGGACATC 
CCTGGTGGAC AAGGAGATGG CGGCCACTTC 
CGGCCAGAAT AGAGGAGATG CTCAGCAAAT 
GTCAAATTGG AGGTGAATGA AAGGATCCTT 
GCAAGCTATT CAGGTGCTCA TCGTGGCCTC 
TTGTGGAGAG CGGCAGGGGT ACAGCATCCC 
AACTCTCQAT GGACAGAAGG ACTTATCTCA 
GGGAGCCACT GTCATGGTGG ATGCAGCTGA 
GGAAATTTGA GGAGCTAATG GTGTGTTCTC 
GCCCAGCTTG TGGCTGCATC CAAGGTGAAA 
CCTAGCCCAG CTGCAGCAGG CCTCTCGGGG 
GCGTTGTGGC CTCAACCATT TCCGGCAAAT 
AACATGGACT TCTCAAGCAT GACGCTGACA 
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TTGGTGGAAG TTGTGCCATG 2 50 

CGAACGTCCT GAAGGACTTT 3 00 

AGCAGGATGT GGGGCCACCT 3 50 

CTACCTGAAA CTGCTAAGAA 4 00 

GGTTCCCAGG CAACCTGCAG 4 50 

GAAAGTGACG TGAACAACTT 500 

CCTGGAGTGT GAACTCAACC 5 50 

TGTCCCGCTC TGTGTCCGTG 600 

CTGATCCAGG TCATCTTGGA 65 0 

GCTTCTCTTC AAACTCCACT 7 00 

ACCGGGACCG CTTCATGGAG 7 50 

CGCTCCAGCA ACCTGCAGTA 800 

GCCTGAGAAC CCACCCAACT 850 

TCAGCCCTGT GGTGGTGATC 900 

CCAGTCCTAG AGAAGGATGA 950 

TTTATTTGAC AACAAGTTTG 1000 

ATCCCTTCAA TTTCAACAGT 1050 

CACTTAATTG AGCGACTATA 1100 

AGAAAACATG AAGACTGAGA 115 0 

ACGTCAGCGA GCTGGAAGCA 12 00 

CAGGCGGCCG ACGACTGTGA 12 50 

GAGGCAGCGG GAGGACACCG 13 00 

AAAGGAAAGC TCAAGCCAAT 13 50 

TACAGCGAGC TGGTTCAGAA 14 00 

GGTGACCAAA CAGGTGTCCA 14 50 

GAGAGAAAAA AGAGCTGGAG 1500 

CAGCGGAAGA CTCAAGAACA 155 0 

ACTTGGCACA AGCCAACGGG 16 00 

CTTCTGCCCA GTCAGAAGCA 165 0 

AAGGAGCGGG ACAGCCTGGT 1700 

ATCTGCTCTT CGGAAAGAAC 1750 

CAGAGGAATC TATGTGCCAG 1800 

GTGGGGTCCA GGAAGGCTGC 18 5 0 

GCTTGAAGAA CCTCCTCTCA 1900 

TCTCCACGGT CACATCCATT 1950 

TGGAGCCAGT ATCTGGCCTG 2000 

CATAACCCTG CTGGCCCACT 205 0 

CCACCTGCCT CAGAGCCCCA 210 0 

TGTAAGCAGT ATGGCAGGGA 2150 

AGAGGGAAGC CTTGAGAATG 2200 

GCAAGATCAA GGCCATCGGC 2250 

AAGCAGGAGG AGCTGGGGGA 2 300 

AGCTGCTATT GAAACTTGCA 2350 

CCCGAGCAGG AGACACAGGA 240 0 

CGTTGCTGTA CCAGCd'CAT 2450 

TAAGGACCTC CAGAGAGAGA 2 500 

CTAAAGAGTT TTATGCCAAG 2550 

GCCTCCAAGG CTGTGGGCTG 2600 

TCTGGTGGTA CAAGGCAGAG 2650 

ATGAAATTGC TGCTAGCACA 2700 

GCTGATAAGG ACAGCCCCAA 2750 

AGTGAACCAG GCCACTGCCG 2800 

CACAGATCGA AGAGACAGAC 2850 

GAGATCAAAC GCCAAGAGAT 290 0 
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GGATTCTCAG 
GTCAAAAACT 
GCTGAGGGCT 
AGTGGTAACC 
TAAATCCTTG 
AATCCTTGGA 
AGGACATGCA 
GTTTGGACCC 
CCAGGCTGGC 
GTGGGGGGCA 
CTATTTGACC 
GTTTGTTGAC 
TTCTTCTCAG 
GAGGGCTGAT 
AGAAGGACGG 
CTCTTTGGGC 
GGTTTTTTGG 
CTCCCAAAGG 
GGTAGCTCCA 
CTCCACACTG 
CGGAGCTGCT 
CCTTTCCCTC 
CAAGGGAGA.\ 
TCCCGTGACG 
AGATCAACAA 
CTGTCAAGCG 
TCCCAACTCC 
GATCGTTCCA 
TGCTCCCATC 
ATTACTAACC 
AATCAGAACT 
ACATCAGCCT 
ACTTGGATCC 
ATGAGAAAGG 
AGGCTCTCGC 
CTTGCCAGGG 
CCAGCATTTA 
TATCTATAGC 



GTTAGGGTGC 
GGGAGAGCTT 
GGGAAGAAGG 
GAAAAAGAAT 
TTACCTATCT 
GTCCCAGGGG 
TGACACTTCC 
ATGGTCATCT 
CAGTGCTGCC 
GGGCCACTCA 
CCCACAACAA 
AGCTTGGAAA 
TCTTTTCAGT 
GGATTCCAAA 
CAGGAGTGTC 
AGTGCCATGG 
VlTTGTTTTT 
GCACACCCCT 
GCGATGGTGC 
ACCAAGTGCT 
GAGTGACAGC 
CTAAAGCTGA 
GACAACAG7VA 
AGGCTCAAAA 
CACTACTTCC 
GGCCGTCTCC 
TTTCTGCAGA 
CTTTCTACGC 
AGGGAAGAAC 
TCCCTTAAGC 
TCAGGTGTGA 
TCAAGAATCA 
CAAAGCAAGG 
ACAGAGCCAG 
TGCCCTGTGG 
ATGGGCAGCC 
AGTGACCTTC 
AACTCATTGG 



TAGAGCTAGA 
CGGAAT^yVGC 
AACAGAGGCA 
AGAGCCAAAC 
CGTGTGTGTT 
CAGCCACACC 
CAAAGATCCC 
CTGTTCTTTT 
CATGAGCAAG 
ACAGAGAGGA 
TGGGTATCCT 
GGGAAGATCT 
TTCATCATTT 
CCAGGACACT 
CTGGCTGTGA 
ATTTCCACTG 
TTTTTTTAAG 
GGGGCTGAGT 
TGCCCAGGCC 
GGCCCACCCA 
TTTCCTCAAA 
ATCCCGGCGG 
AGAGGGACAA 
ACTTGATCAC 
CTGCCGGAAT 
CTTGGCCCAG 
CGTCTGCCTT 
AATTGACAAA 
CCTATACTTG 
AGCAACAGCC 
CTCTAGCAAA 
GAAGAAAGCC 
AGATCATTTG 
CGGCTCCAAC 
ACAGGATGAG 
CAACAGCACT 
TGATCTTGGG 
TGGTAGCCAT 



AAATGAATTG 
ACTACGAGCT 
TCTCCACCTA 
CAACACCCCA 
ATTTCCCCAG 
ACTGCCATTA 
TCCATAGCGA 
CCCGCCTCCC 
CCTAGGTACG 
CCAACATCCA 
TAATAGAGGA 
TATGCCTTTT 
GCACAAACTT 
ACCCTGAGAT 
ATGCCAAAGC 
CTTCTTATGG 
TTTCACTCAC 
CTCCAGGGCC 
TCTCGGTGCT 
GTCCATGCTC 
AAGCAGAAGG 
AAAGCCTCTG 
GAGGGTTCAC 
ATGCTTGAAT 
GAACTGTCCG 
AGACGGAGTG 
GGCATCCTCT 
CCCGGAAGAT 
GTTTGCTACC 
TACAAAGAGA 
GCTCATCTTT 
AAGGTGCTTGG 
GAGCTCTTGG 
TCCTTTCAGC 
GACAGAGGGC 
TTTCCTCTTC 
AAAACAGCGT 
CAAGCACTTC 



CAGAAGGAGC 2950 

TGCTGGTGTT 3000 

CACTGCAAGA 3050 

TATGTCAGTG 3100 

CCACAGGCCA 3150 

CCCAGTGCCG 3200 

CACCCTTTCT 3250 

TAGTTAGCAT 33 00 

AAGAGGGGTG 3 3 50 

GTCCTGCTGA 3400 

GCTGCTTGTT 3450 

CTTTTCTGTT 3 500 

GTGAGCATCA 3 550 

CTGCACAGTC 3600 

CATTCTCCCC 3650 

TGGTTGGTTG 3 700 

ATAGCCAACT 3 75 0 

CCCCAACTGT 3 800 

CCATCTCCGC 3 850 

CAGGGTCAGG 3 900 

AGAGTGAGTG 4 000 

TCCGCCTTTA 4 050 

ACAGCCCAGT 4100 

GGAGCTGGTG 4150 

TGAATGGTCT 4200 

TGGGAGTGAT 425 0 

TGAATAGGAA 4 300 

CAGATGCAAT 4 350 

CTTAGTATTT 44 00 

TGCTTGGAGC 44 50 

CTGCCCGGCT 4500 

ACTGTTACTG 4550 

GTCAGAGAAA 4600 

CACATGCCCC 465 0 

ACATGAACAG 4 700 

TAGATGGACC 4750 

CTTCCTTCTT 4800 

GGAATT 4846 



(2) INPORMATION FOR SEQ ID NO 6 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 924 

(B) TYPE: protein 

(D) TOPOLOGY: linear 

(ii) MOLECLa>E TYPE: protein 

(iii) HYPOTHETICAL: no 
(vi) ORIGINAL SOURCE: 
(A) ORGANISM: human 

(ix) FEATURE: Huntingtin-interacting protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6 
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Met Ser Arg Met Trp Gly His Leu Ser Glu Gly Tyr Glv Gin Leu 
1 5 10 * 15 

Cys Ser lie Tyr Leu Lys Leu Leu Arg Thr Lys Met Glu Tyr His 

20 25 30 

Thr Lys Asn Pro Arg Phe Pro Gly Asn Leu Gin Met Ser Asp Arg 

35 40 45 

Gin Leu Asp Glu Ala Gly Glu Ser Asp Val Asn Asn Phe Phe Gin 

50 55 60 

Leu Thr Val Glu Met Phe Asp Tyr Leu Glu Cys Glu Leu Asn Leu 

65 70 75 

Phe Gin Thr Val Phe Asn Ser Leu Asp Met Ser Arg Ser Val Ser 

80 85 90 

Val Thr Ala Ala Gly Gin Cys Arg Leu Ala Pro Leu lie Gin Val 

95 100 105 

lie Leu Asp Cys Ser His Leu Tyr Asp Tyr Thr Val Lys Leu Leu 

110 115 120 

Phe Lys Leu His Ser Cys Leu Pro Ala Asp Thr Leu Gin Gly His 

125 130 135 

Arg Asp Arg Phe Met Glu Gin Phe Thr Lys Leu Lys Asp Leu Phe 

140 145 150 

Tyr Arg Ser Ser Asn Leu Gin Tyr Phe Lys Arg Leu lie Glnlle 

155 160 165 

Pro Gin Leu Pro Glu Asn Pro Pro Asn Phe Leu Arg Ala Ser Ala 

170 175 180 

Leu Ser Glu His lie Ser Pro Val Val Val lie Pro Ala Glu Ala 

185 190 195 

Ser Ser Pro Asp Ser Glu Pro Val Leu Glu Lys Asp Asp Leu Met 

200 205 210 

Asp Met Asp Ala Ser Gin Gin Asn Leu Phe Asp Asn Lys Phe Asp 

215 220 225 

Asp lie Phe Gly Ser Ser Phe Ser Ser Asp Pro Phe Asn Phe Asn 

230 235 240 

Ser Gin Asn Gly Val Asn Lys Asp Glu Lys Asp His Leu lie Glu 

245 250 255 

Arg Leu Tyr Arg Glu lie Ser Gly Leu Lys Ala Gin Leu Glu Asn 

260 265 270 
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Met Lvs Thr Glu Ser Gin Arg Val Val Leu Gin Leu Lys Gly His 
^ 275 280 285 

Val Ser Glu Leu Glu Ala Asp Leu Ala Glu Gin Gin His Leu Arg 

290 295 300 

Gin Gin Ala Ala Asp Asp Cys Glu Phe Leu Arg Ala Glu Leu Asp 

305 310 315 

Glu Leu Arg Arg Gin Arg Glu Asp Thr Glu Lys Ala Gin Arg Ser 

320 325 330 

Leu Ser Glu He Glu Arg Lys Ala Gin Ala Asn Glu Gin Arg Tyr 

335 340 345 

Ser Lvs Leu Lys Glu Lys Tyr Ser Glu Leu Val Gin Asn His Ala 

350 355 360 

ASD Leu Leu Arg Lys Asn Ala Glu Val Thr Lys Gin Val Ser MeC 
^ 365 370 375 

Ala Arq Gin Ala Gin Val Asp Leu Glu Arg Glu Lys Lys Glu Leu 

380 385 390 

Glu Asp Ser Leu Glu Arg He Ser Asp Gin Gly Gin Arg Lys Thr 

395 400 405 

Gin Glu Gin Leu Glu Val Leu Glu Ser Leu Lys Gin Glu Leu Gly 

410 415 420 

Thr Ser Gin Arg Glu Leu Gin Val Leu Gin Gly Ser Leu Glu Thr 

425 430 435 

Ser Ala Gin Ser Glu Ala Asn Trp Ala Ala Glu Phe Ala Glu Leu 

440 445 450 

Glu Lys Glu Arg Asp Ser Leu Val Ser Gly Ala Ala His Arg Glu 

455 460 465 

Glu Glu Leu Ser Ala Leu Arg Lys Glu Leu Gin Asp Thr Gin Leu 

470 475 480 

Lvs Leu Ala Ser Thr Glu Glu Ser Met Cys Gin Leu Ala Lys Asp 

485 490 495 

Gin Arq Lys Met Leu Leu Val Gly Ser Arg Lys Ala Ala Glu Gin 

500 505 510 

Val He Gin Asp Ala Leu Asn Gin Leu Glu Glu Pro Pro Leu He 

515 520 525 

Ser Cvs Ala Gly Ser Ala Asp His Leu Leu Ser Thr Val Thr Ser 

530 535 540 
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lie Ser Ser Cys He Glu Gin Leu Glu Lys Ser Trp Ser Gin Tyr 

545 550 555 

Leu Ala Cys Pro Glu Asp He Ser Gly Leu Leu His Ser He Thr 

560 565 570 

Leu Leu Ala His Leu Thr Ser Asp Ala He Ala His Gly Ala Thr 

575 580 585 

Thr Cys Leu Arg Ala Pro Pro Glu Pro Ala Asp Ser Leu Thr Glu 

590 595 600 

Ala Cys Lys Gin Tyr Gly Arg Glu Thr Leu Ala Tyr Leu Ala Ser 

605 610 615 

Leu Glu Glu Glu Gly Ser Leu Glu Asn Ala Asp Ser Thr Ala Met 

620 625 630 

Arg Asn Cys Leu Ser Lys He Lys Ala He Gly Glu Glu Leu Leu 

635 640 645 

Pro Arg Gly Leu Asp He Lys Gin Glu Glu Leu Gly Asp Leu Val 

650 655 660 

Asp Lys Glu Met Ala Ala Thr Ser Ala Ala He Glu Thr Cys Thr 

665 670 675 

Ala Arg He Glu Glu Met Leu Ser Lys Ser Arg Ala Gly Asp Thr 

680 685 690 

Gly Val Lys Leu Glu Val Asn Glu Arg He Leu Arg Cys Cys Thr 

695 700 705 

Ser Leu Met Gin Ala He Gin Val Leu He Val Ala Ser Lys Asp 

710 715 720 

Leu Gin Arg Glu He Val Glu Ser Gly Arg Gly Thr Ala Ser Pro 

725 730 735 

Lys Glu Phe Tyr Ala Lys Asn Ser Arg Trp Thr Glu Gly Leu He 

740 745 750 

Ser Ala Ser Lys Ala Val Gly Trp Gly Ala Thr Val Met Val Asp 

765 770 775 

Ala Ala Asp Leu Val Val Gin Gly Arg Gly Lys Phe Glu Glu Leu 

780 785 790 

Met Val Cys Ser His Glu He Ala Ala Ser Thr Ala Gin Leu Val 

795 800 805 

Ala Ala Ser Lys Val Lys Ala Asp Lys Asp Ser Pro Asn Leu Ala 

810 815 820 
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Gin Leu Gin 
Val Val Ala 
Asp Asn Met 
Gin Glu Met 
Leu Gin Lys 
Tyr Glu Leu 
Ala Ser Pro 



Gin Ala Ser 
825 

Ser Thr lie 
840 

Asp Phe Ser 
855 

Asp Ser Gin 
870 

Glu Arg Gin 
885 

Ala Gly Val 
900 

Pro Thr Leu 
915 



. 31 - 

Arg Gly Val 

Ser Gly Lys 
Ser Met Thr 
Val Arg Val 
Lys Leu Gly 
Ala Glu Gly 
Gin Glu Val 



Asn Gin Ala 
830 

Ser Gin lie 
845 

Leu Thr Gin 
860 

Leu Glu Leu 
875 

Glu Leu Arg 
890 

Trp Glu Glu 
905 

Val Thr Glu 
920 
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Thr Ala Gly 
835 

Glu Glu Thr 
850 

lie Lys Arg 
865 

Glu Asn Glu 
880 

Lys Lys His 
895 

Gly Thr Glu 
910 

Lys Glu 
924 
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C LA I M S 



1. 



A cDNA molecule comprising the sequence given by Seq FD No. 1 



A cDNA molecule comprising the sequence given by Seq. ID No 5 



3. 



A polypeptide comprising the sequence given by Seq ID No. 2 



4. 



A polypeptide comprising the sequence given by Seq. ID No. 6. 



5 A chimeric gene or plasmid comprising at least nucleotides 3 14 to 1955 
of the Huntington^s Disease gene and an activating or DNA binding domain suitable for use in 
a yeast multi-hybrid assay. 

6 The chimeric gene or plasmid according to claim 5, wherein the 
Huntington's Disease gene encodes a polyglutamine tract having a length of 35 or fewer 
residues 

7 The chimeric gene or plasmid according to claim 5, wherein the 
Huntington s Disease gene encodes a polyglutamine tract having a length of 36 or more 
residues 



patient expressing Huntingtin protein with an expanded CAG repeat region, comprising the 
step of increasing the amount of an expressed HD-interacting polypeptide in the brain of the 
patient, wherein the expressed HD-interacting polypeptide interacts less well with expanded 
Huntingtin than with Huntingtin having a CAG repeat region containing J 5 to 35 repeats and 
facilitates the incorporation of Huntingtin into brain cell membranes 



8. 



A method for ameliorating the eflFects of Huntington's disease in a 



9. The method according to claim 8, wherein the expressed HD- 
interacting polypeptide comprises the sequence given by Seq. ID No. 2 
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1 1 0. An antibody which binds to a polypeptide having the sequence given by 

2 Seq. ID No. 2. 

\ 11 The antibody of claim 10, wherein the antibody binds to amino acids 

2 76-91 of the polypeptide having the sequence shown in Seq. ID No. 2. 

1 12. An expression vector for expression of a gene in a mammalian host 

2 comprising a region encoding an HD-interacting polypeptide, wherein the HD-interacting 

3 polypeptide interacts less well with expanded Huntingtin than with Huntingtin having a CAG 

4 repeat region containing 1 5 to 35 repeats and facilitates the incorporation of Huntingtin into 

5 brain cell membranes. 

1 13. An expression vector for expression of a gene in a mammalian host 

2 comprising a region that is the same as or complementary to Seq. ID NO. 1 . 

1 14. An expression vector for expression of a gene in a mammalian host 

2 comprising a region that is the same as or complementary to Seq. ID NO. 5. 

1 15. The expression vector according to claims of claims 12-14, further 

2 comprising a region encoding Huntingtin having a polyglutamine tract of 35 or fewer. 

1 1 6. An oligonucleotide probe having a length of from 1 5-40 bases which 

2 specifically and selectively hybridizes with the cDNA given by Seq. ID No. 1 or a sequence 

3 complementary thereto. 
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